MASSACHUSETTS INSTITUTE OF TECHNOLOGY 
ARTIFICIAL INTELLIGENCE LABORATORY 

A.I. Memo No. 788 July, 1984 



TOWARD A PRINCIPLE-BASED PARSER 
G. Edward Barton, Jr. 



ABSTRACT: 

Parser design lags behind Linguistic theory. While modern transformational grammar 
lias largel)'- abcuidoned complex, language-specific rule systems in favor of modular subsys- 
tems of princii)le3 and param.(;ters, fclie rule systems that underlie existing natural -laaigiiage 
parsers are still large, detailed, and complicated. The shift to modular theories in linguistics 
took place bcx-ause of the scientific disadvantages of such rule systems. Those scientific ilk 
translate into engineering maladies that make building natural-hmguage systenus difficult. 

The cure for these problenis should be the same in parser design as it was in linguistic 
theory. The shift to modular theories of syntax should be replicated in parsing practice; a 
parser should base its actions on interacting modules of principles ;md parauK^ters rather 
than a complex, monolithic rule system. If it cmi be successfully carried out, the shift v/ill 
make it easier to build natural-limgimge systems because it will shorten and simplify the 
language descriptions that are needed for parsing. It will also allow parser design to track 
new developments in Hnguistic theory. 



This report describes research done at the Artificial Intelligence Laboratory of the Maisarhiisetts 
Iiustitutc of Tcchnoloj!;y. Support for tlio Laboratory's artificial intelligeuce research hns been pro- 
vided iu i)art by tin; Adviuiccd lleHeurch Projects Agency of the Department of DefenHe luidcr Office 
of Naval Researcli contract NOOOl 4-8()-C-0505. This is a revised version of a Ph.D. thesis proposal 
siilntiitted to the Di;])ar(nieut of Electrical Engineering and Compiitcjr Science on Fel)ruary 28, 
1981. Support for tiie author's graduate studies has been provided by the Famde tuid John Hertz 
Foiiiidation. 

©Massacliusctts hiatitiite of Technology, 1984 



Prhulplc-Ihiscd Piusing Preview 

1. Preview 

The focus of linguistic theory has shifted away from complex rule systems to modular 
systems of principles, but the practice of parser design has not kept pace. Natural-language 
j)arsers are still built on complex rule systems. Few implementation models are known 
for the new theories of gr.immar, and those that do exist fail to preserve their modular 
organization. Research is needed on how to embody the new theories in parsers. 

1.1. Linguistic theory and parsing practice 

The human ability to use and understand a langiiage depends in part on knowledge of 
the syntactic structure of its sentences. Native speakers of the language learn its syntax by 
acqtiiring some nunitally represent(!d system of rules iuid principles. Their syntactic abilities 
result from possessing both such a grammar and implicit knowledge of how to put it to use. 
It is the business of generative hnguistics to identify the rules and principles mid explain 
liow they cu:e actjuired and nsed. 

A natural-language program is designed to approximate part of the human ability 
to us(^ and understand natur.'d language. Since it too must be sensitive to tlie syntactic 
structure of senttniccs, the program must be based on some approximation to the system of 
rules and principles that linguistics is striving to identify. Given this intinuite connection 
betweeu hnguistics fmd the design of natural-language programs, it is naturd to expect 
that ])arHing ])ra,ctic(; should closely track developments in linguistic theory. As lingnistics 
I)rovidc6 bettc^r accounts of tlie rules and principles that d(;fine natural-language syntax, 
tliey can be embodi(?d in programs that use better approximations to Hnguistic reality. 

However, recent theoretical shifts in hnguistics have not been matched by correspond- 
ing (leveloi)ments in the practice of pcirser design. Under early theories of transformational 
grammar, e;uh language was described by a large system of comjjlicated rules that nictic- 
idously spell(Ml out the details of their operation. In contrast, new theories suggest that 
complicated language-specific rule systems do not form an important part of a person's 
syntactic knowledge. The focus of Hnguistic theory has shifted to the study of modular 
sTibsystems of grammatical principles and jxirameters. 

1.2. Replicating the shift to modular syntax 

A closer look at hnguistic theory shows that there were good reasons for this shift. 
Eiu-ly granuuatical theories sufFcM'cd (rom several scientific ills. Their detailed rule systems 
seemed derivative rather than fundamental. They were more stipulativc tluui explanatory, 
th(^y made wf\'ik claims about tlu^ nature of natural language, and they made language 
acquisition s(^em a mind-boggling task. Tlie new modular theories cured these ills. By 
uutanghng the clfcHis of sc])iirate imderlying subsystems of grammar, they were able to 
come c]os(n- to uncov(u-ing the principles that form the true basis of syntactic knowledge. 

A corrcspoiuling look at the practice of parser design confirms that no such shift has 
taken place tlu^re. Eacli langu-ige is still described by a large system of comi)licatcd ndes 
that si)(>ll out the details of their operation. Such complicated rule systems are no more 



Pnnciplc- Iht^^vd Parsmfr Pursing 

d('8itabl(! ill engiiieeriiig tliuii they were in science, for the scientific ills that afllicted them 
in hiigtiistic theory can translate to engineering nuiladies hi parser design. They make it 
dillicuU, to huihl natural-h'inguage systems. 

Tlir cure for th(^se (Migineering mahidies shouhl be the same in parser design as it was 
in hiiguistic thc^ory. It shonhl he possibk; for a parser to base its actions on interacting 
principh^s ;i.nd parameters instead of complicated rule systems. The development of such a 
parser would rephcate in parsnig practice the shift to modular theories of syntax. Just <is 
the sliift toward niodulariiy simplified hnguistic theories, it would shorten and simplify tlie 
descriptions of particular languages that are neede<l for parsers. It would thus mnkc such 
ilescriptious easier to write. 

1.3. A roadmap 

Section 2 will sketch the logical relationship that binds together natural-huignage pro- 
grams, th(H)ri(>s of grammar, and hnguistics. Section 3 will characterize the language descrip- 
tions tliat w<;re used in old-style syntactic theories and point out their scientific disadvan- 
tages. S(H-tion 4 will show that the language descriptions used in current natural-lmiguage 
programs have larg(;ly the same character fuid a possess ;i corresponding set of engineering 
disadvantages. Section 5 will (h^scribe the theoreticid shift in Hnguistics that cured the 
scientific ills of earh(>r thcori(^s, wliile section G will detail the proposal that the shift should 
be rephcated in th(> dc^sign of na,tural-language systems. Section 7 will tentatively describe 
some possible design characteristics of a principle-based parser, and Section 8 will discuss 
the hnph^mentation techni(pie of representing theoretic;J predicates and constrcunts implic- 
itly in parser oi)eration rather than explicitly in data structures. Section 9 will mention 
r<>]at(Hl earhcr work, while section 10 will suggest a rough plan by which a principle-l)ascd 
parser might be developed. 



2. The logical nature of natural-language parsing 

The arrangement of words in a sentence matters as much as what the words m-e:^ 

(1) (i) Fred killed the spider 
(ii) tlie spider kill(Hl Fred 

(2) (i) fatal accidcmts det(!r careful drivers 
(ii) deter drivers accidents fatal carefiil 

(3) (i) I told Fred a ghost; story 

(ii) 1 told Fvv.d a gliost story was the last thing I wiinted to hejix 

A program that bit(H-j)rets s<^ntences nnist know the syntactic structure of a language in 

'The first four of theso I'xamjilos are from Daddcley (1976:3J0). 

2 



Peine iplc-Ua>.cil rarshm Pursing 

addition to tlic import of its words. '-^'^ As one of its cojistitiiciits siicli a progr;uu must have 
a parser that recovers the structiir(^ of input soutetices. 

In many castas the parser is a sepm'ate component tliat buihls ;ui exphcit tree-like 
re[)resentatioji of syntactic structure. In otlier cases no exphcit syntactic representation 
is built; the recovery of structure is intertwined with the process of semantic interpreta- 
tion and the parsing component is only implicit. Either way, though, the parsuig process 
analyzes an input sentence according to some theory of hnguistic structure. Th(> parser 
implicitly end)odies this theory, supplementing grammatical knowledge with away of using 
tliat knowledge to auiilyze S(>iitence3. 

2.1. The definitive account of natural-language syntax 

Any parser is implicitly based on a hnguistic theory. Since oidy a theory of syntax 
can s{)ecify what syntactic structure the parser should assign to a sentence, it is a syntactic 
theory that defines the computational problem a parser must solve. The defining role of a 
syntactic llu^ny mak{>s the choice of syntactic theory importajit for parsing and nattiral- 
language processing. 

Humans, not machines, speak definitive "natural language." The syntactic theory that 
is ultimately correct will be tlu; one that succeeds at describing the tacit knowledge of 
hnguistic structure that nnd(n-lies a human S])eaker's syntactic abihties. Characterizing this 
tacit knowledge has long been a goiil of generative hnguistics. 

According to linguistic tlieory, such knowledge takes the form of a mentally represented 
system of rub s and principles that generate and relate various kinds of mental represen- 
tiUions. Making up a mentally represented (jraynmar, tliese rules and principles enter into 
vaj-ious UTiconscious mental computations that are carri(Ml out in the process of producing 
a.ud understanding S(>ntences. 

A natural-language program docs not have to be based on the sa.me system of rules 

-On onf poyrfihlc accouiii of tljc linguistic deficit involved in Broca's aphasia, tho occasional comprohcnsion 
ttiliicnltics of Broca's aphasics illnstrait; the importance of syntax. Liglitfoot (1982:188f) connncnts: 

(TCxj)eriuientsi ... found tliat t.liese patient.s could understand a sent.ence like 77t<: apple 
that the hnj w t:ativ(j w red, where tli<' relations among the major words are constrained by 
our knowledge of the world: a,pplea but not boys axe red; boys eat a])ple3, and not vice 
versa. A s<'ntence like this can be understood v/itiiout rflia.ncc on the function words 

and withcjr.t having to analyze the structure of the scntc;ice in any detailed way 

On Mie oilier liajid, a, yentencc: like The. girl that the hnj is cha.nng is tall is more dilKcult. 
Both girls and boys iiiay Ije tall, and not only can girls chase boys, but a.lso boys can 
cluise girls. In order to unihTstaud such a sentence, one needs to be able to conduct a 
delajled analysis, id<'iiiifyiug the proper role of function words like the, that, and is. This 
is beyond the capacity of Broca's aphasics, and they do not uuilerstand h\ic1i sentences 

in the way that normals do In short. Broca's ajihasics cope well with sentences 

wher(.' their knowledge of the world can get tht-m by. They do very badly when they 
must rely on a syntactic analysis of the sentence in order to know what it rnt;ana. 

•'Roger Schajik, (jnoted hi Winston and Prendergast (l084:lGCf), expresses quite a contrary view of t.he 
ij))j)orta,nce of syiifa.x: 'i think ... that research ou .syntax should have stopped fifteen years ago .... 

ISiyntax is not worth working on '' If the natural-language systems that, Scluuik advocates truly ignore 

syu*ax. they can be expected to have the kind of com]irelicnsion deficit that Lightfoot says Broca's aphasics 
have (see previous note). 

3 



Pniuiplv- Based Pnrshi',!: Parsing 

and i)riiicij)li's that is rcpri-seiit(Ml iji the iriiiid of a Inunaii s[)ea.k(M-; only a.ii api)i-oxnuat,ion 
to tJiat syt^tcMii is reqivir(Ml. Iiu1(mm1, the exact details of the luniiaii system are Jiot currently 
knowJi. Nev(-rtlieless, it is the mentally reprc^sented grammar that is tlie ultimate standard 
defining the l;uigna,ge a p(n-son si)eaks. Wluni the program grammar disagrees with the 
human grammar about the r<da,tiou betwcxyri souud (or orthography) and meaning, it is the 
human grammar that is correct. 

If there is too nnich disparity between natural language and the version of natural 
language tliat a ])rogra,m acce])ts, tlu; i)rogra.m"s hngnistic behavior ca,n Ix^ so frustrating as 
to make it useless. A ])r()gfa]u with faulty "knowledge of langruige" can hui)Ose inappropriate 
intcn-jjretations on seemingly clear inputs. If its knowl(Hlg(> describes too few constructions, 
it can <-dso place irritatijig, seemingly arbitrary restrictions on the range of syiitactic forms 
it will accept. 

2.2. The logical problem of parsing 

Given a hnguistic theory, how does it constrfiin the operation of a corresponding ])arser? 
The i)arser cannot simply read off the syntactic structure of a sentence from its surface form, 
since surface form does not expHcitly indicate that structure. Rather, the parser nnist use its 
inijdicit knowledge of lan;;uage in an active way to guide the recovery of syntactic strncture. 

Trom an abstract point of view, tlu; task of the p.'irser is to iind a full syntactic represen- 
tation that satisfies two conditions: the representation must be well-formed according to the 
hnguistic theory that the parser erid)odies, iUid the surface form of the re])resentation must 
be cousistfMit with the input sentence. In many cases, it is hkely that two ])ossible syntactic 
representations will be well-formed und(-r tlu; theory and consistent with the s<uitejic€: 

(4) visiting relativ(^s can be boring 

Therefore^, two jiarses will be ])Ossible. Sentences hke (4) will lience be syntactically <un- 
bignoua. 

From the most neutral logical point of view, a theory of syntax does not constrain 
])arser operation beyond lliis sim])le input/output relaiiomsliip. Clearly, then, a theory of 
syntax does not coni])l(>t(^ly det(n-inin<' a pa,rser. In addition to knowing the possible syntactic 
structures of a language, tlie parser must ])ossess .'Ui eO'ective nu'thod of [)utting syntactic 
knowhulf^e to use in actual sentence processing. An LR[k) ])arser and an implementation 
of Earley's algorithm may both use the same context-free grammar a,ud hence share the 
sa,in(^ linguistic knowledge. Altliough tln^y will solve the ,sam«> p;irsing problem, they will 
o[)erat(^ difl'erenlly b( < aus(^ tiiey liave different methods of j)utting their grammars to use 
in sentence processing. A th(X)ry of grammar is a theory of grammatical couipch:ncc, v/hile 
the o[)erati(vn of a parser also inchnhn^ asjjects of grammatical perforinav.ce. 

Ma.rr {lf}82:25) dist inj^uisju's three l(W(\ls at wliich a.n information-processiTig .system 
must be understood. At tli(> 1(~V(>1 of corrijnitdtioiud theory, it is necessaxy to identify th(^ goal 
of the comi>utat ion, understand wiry it is appropriate to the task at hand, and Investigate 
the logic of the strategy by which it can b<^ <-arrIed out. At the level of rcpre.Heyitaiion 
and ahjor-itlirn, the relevant, qu.estion is how tlu> comput.Uion is im])l(Mnented through the 
use of ])articula,r repr<r.eut;itions <uid algorithms. At the lev(d of hardware implementation, 

4 



rrhirlplc-lliscd /',-.,r,si/(/( Science 

one invest i;;u((\s the {)hysica,l n>;vliz;it ion of the reijn-.sciitations juul algoritliniH. In Marr's 
terms, {]w theory of syntax is ])af( of tlie \oyc\ of (•oni])utational tlieory, while a eoini)h>te 
description of the ])arsiug ])ror(^ss would include all thi-cc. 



3. Language descriptions in early grammatical theories 

HndcT (^arly theories of traiisforrnat ioiial granunar, each hiuguage was described by a 
large system of coiuj)licated ruh^s. The ruh^s ineticidously si)elled out tlie details of their 
o])erat ion. All hough these rule syst ems often described the facts about various coiistanictions 
rathcM- siu-cessntlly, tluy faih^l to nie(>t otlier scieutilic goals of linguistics: 

• The reduction of grain malical phejiomena to a complex, stipulative nde system did 
not have the explanatory power thai reduction to a snuvll set of pri.ncii)l(\', could 
have. 

a The choice of an nnconstraJned rule franu'work made excessively weak claijus about 
the [)rop(-rlies of huniaJi hinguages in gen(-ral, since the avail;d)ility of powerful 
descriptive^ diwices in rides led to the ability to desi:ribe "lajiguages" with proi)erties 
(}uite vudike those attestcul in natirrcd la,nguages. 

9 The ainouut a,nd complexity of the information required to describe iiulividuiil 
langua.g(;s riuu\v it a mystery hov/ children couhl learJi ]angua.ges from the evidence 
available to theni. 

a The lack of substantial residts from nniv(-'rsa.l grammar made it a mystery what 
constraints a child might implicitly use to choose from the myriad possible gram- 
m.irs com]tatibl(> with oljserved sentences. 

As s<x tion 4.2 will show, the scientiiic disadvantages of such rule systems are not merely 
of tlieor(>tical intere^st. They carry over directly into problejns for the desigTier of naturid- 
langnage .systems. 

3.1. ComplicatGd rule systems in early grammatical theories 

Until r(^cently, the rul(> systems involved in transformationa,l theories of grammar were 
c(uupHca.ted and highly language-specific. VavcU nde (xphcitly spelled out the details of its 
api)licatiou. For example, the Passive Transformation of English might liave been stated aa 
follows (Fiengo, I977:3G): 

X WP Y V NP Z 

(5) 12 3 4 5 6 

oi>fionnl . r o i , 4 n \ 't 

r:'L^-::;> 1 5 3 be hcH 4 c o by 2 
The rule could a,j)j)ly to a.n underlying structure that roughly corres})onds to the following 
surface sent(\nce: 
(G) th(^ hij)jH)gri[f loves tlie nu'rmajd d(U\ply 

Oixiating on that siructure a.s shown in Figure 1, the rule would jiroduce tlu> following 
sentfuice witJi accompa.nying striu-turai iidormation: 



Principle-Based Purshif^ 



Science 



X NP Y 



V NP Z 



optioTwiI ^ 

* > 1 



3 be+en 
(a) 



6 

6 by 2 



[nj. The hippogriff ] pres [v love ] [np the inermcud ] deeply 

(b) 





the hippogriff 


pres 


love 


the mermciid 


deep 


X 


NP 


Y 


V 


NP 


Z 


1 


2 


3 


4 


5 


6 



(c) 



1 


5 


3 


be+en 


4 


e 


6 


by 


2 




the mennaid 


pres 


be+en 


love 


e 


deeply 


by 


the hippogriff 



Figure 1: The old-style Passive transformation (5), here repeated as (a), would transform 
the structiire associated with strhig (6) into the structure associated with string (7). The 
underlying structure (b) would match the ride condition ;is indicated in (c). The correspon- 
dence established by matching would then be used to build an output structure as shown 
in (d). Various theoretical details ;uid the treatment of Tense liave been glossed over in this 
example. 



(7) the mermaid pres be+en loved e deeply by the hippogriff 

(The symbol e refers to the empty constituent.) Other minor rules would apply to give the 
passive marker -en aiui the tense pres their proper expression, find the passive version of 
the sentence woidd emerge: 

(8) the mermaid is loved deeply by the hippogriff 

The statement of the Pa.ssive nde (5) is quite complicated. The condition of the rule 
uses both Vciriables such as X and categories such as V to describe the surrounding context; 
the action of the rule includes two movements, the insertion of ixn empty category, and 
the insertion of be, -en, and by. This complexity is still not enough, however; since this 
rule creates a 6y-i)hrase, some other rule will be needed for prodiicing passives that do not 
contain 6^-phrases: 

(9) the temple was destroyed in 1945 

The rules definuig basic constituent structure were also complicated and idiosyncratic; 
they exphcitly s})(>cified such details as constituent ord(u- and type. For example, Jacken- 
doff (1977) proposed the following phrase-structure rules to describe basic constituent order 



Prhi(i})le-Diis('d Piwsiu<f Science 

within a sentence: 

V'" ==> [N'") {M'") V" 

(10) V" => {have - en) {be - ing) {[Adv, + Trans]'")* V {P'")* {S) 

V' =^V {N'") {Prt'") {[-Obj,~Det]"') {P'") {[+Obj, + Comp]"') 

Here parentlieses iiroimd a constitiient indiccite optionality, the asterisk indicates indefinite 
repetition, and sqnare brackets indicate feature notation. Jackendoff 's V rule could apply 
to generate the verb phrcise in this sentence: 

(11) the judge [^rl [y sent ] [^/" the convict ] [,,/" to prison ] ] 

Jackendoff also used other notational devices such as cUigle brackets and curly braces in the 
statement of phrase-structure rules. 

3.2. Scientific disadvantages of complicated rule systems 

The rule systems found in early gramniaticid theories had to be complicated to operate 
properly. A powerful descriptive apparatus was necesstixy for writing down the restrictions. 
Both of these facts led to unfortunate consequences. 

3.2.1. Detailed rules seemed descriptively necessary 

Eaxly generative grammarians were trying to carefully cmd precisely fornndate rules 
that could describe the properties of various grammatical constructions. In pursiiing this 
goal they were driven to write very detailed rules, for there seemed no other way to prevent 
the rules from applying impropc^rly. For example, the Passive rule (5) had to introduce the 
copula be imd the |)assive morpheme -en so that passive constituent order couldn't surface 
with active verb forms: 

(12) *the mermaid loves deeply by the hippogriff 

It had to mention V so that the proper insertion position for ie+en coidd be specified. 
Adjacency to V w«is also required so that other ungrammaticid sentences wouldn't be gen- 
erated: 

(13) (i) John hit Bill with a club 

(ii) *a club was hit Bill with 

Even with the detailed rule, some unwanted cases might shp through depending on how the 
other rules worked: 

(14) (i) projects Hkc that, he'll never get ME to support 

(ii) *me, he'll never get e to support by proj(^cts hke that 

And in any case, another rule would be n(;eded for generating agentless passives such as (9). 
The theory would thus fail to capture «uiy similarity between "long" and "short" passiv(\s. 

3.2.2. Complex rule systems are not explanatory 

The early grammatical theories W(>re fmrly succ{>ssful at using sy.steuis of rules to cap- 
ture the prox)ertics of various constructions, but the rule systems were highly stipulative. 



rrhniplc-Biiscd Parsing Science 

The theories stated tlie rules, but could give no theoretical reasons tvhy the details of the 
rules should be the way they were. For example, the statement that a rule is obligatory is 
nicTely descriptive, leaving unexj)laiiied the qiiestion of why a derivation in which it fails to 
a])ply residts in nngrannuaticrility. 

Science is gener.'illy not contcnit to h;avc complexity unexplained, saying the complexity 
is "jtist the way things are," but always strives to explain it through redncticm to simpler 
principles. There w;is the possibility that the complicated rule systems were only derivative, 
corresponding to the combined effects of more fundamental principles rather than being 
fundamental in themselves. K that turned out to be the true situation, the early theories 
coidd still be i)artly correct. The general processes that they took to be involved in the 
derivation of various constriictions could still be hivolved, but with the details of their 
operation following from general principles rather thim the details of rule statements. 

3.2.3. Complex rule systems are too unconstrained 

The rule systems also drew on a powerful, Tinconstrained descriptive apparatus. In 
attempting to restrict the application of rules to their proper domains, grammarians used 
a wide v*iriety of notational devices hi the nde patterns or structural descriptions (SDs) of 
rules: 

Among the enrichments of the theory of SDs that appear in the hterature, 
theoretical and applied, are the following: disjunctions of [SDs], meaning 
that th(! factors may satisfy any one of the disjuncts; wider possibilities 
for [individufil elements of riile patterns]; SDs defined in terms of Boolean 
conditions [on the set of SDs a])plying to the sentcuice]; conditions ex- 
pressed in terms of qtiantifiers; conditions involving grammatical relations 
[such as subject <ind object]; SDs expressing quite arbitrary conditions on 
phra.se niarkers or even sets of noncontiguous phrjxse markers of a deriva- 
tion; SDs expressing conditions not hmited to a single derivation; SDs 
involving extrasyntactic or even extragrammaticid factors, e.g., beliefs. 
(Chomsky, 1976:310) 

K hnguistic theory makes available without constraint such a wide variety of mechanisms 
for Tise in language.^ descrii)tion8, it wiU make extremely weak claims about what constitutes 
a jjossible natural language. Unless the dcscrij)tivc apjKiratus is further constrained, the 
theory of imiversal granmiar will be scientifically vacuous because it will chiim virtually 
nothing. 

A weak theory of universal grammar is thus undesirable on general scientific grounds. 
However, it is further undesirable because it is incorrect: it makes the wrong predictions 
about the range of vjuiation in natural languages. A weak theory predicts that natur<d lan- 
gxiages can potentially differ iu'bitrarily Jiiuch in structure, but this wide range of variation 
is not attested. 

For example, as Baltin (1981:4) notes, Wackernagel's Law statics that a phenomenon 
called cliticization always j)laces clitics either in secoTid position in the S(^nt(!nce or attaclied 
to the verb. There are api)ar(^ntly no languag<'s iii which clitics attach to the last noun 



Principlc-B:Lscd Pursing Science 

I)lirasc in the sentenco, or to the third word ignoring constituent Ixnuularies. Greenberg 
( 1963) cites other simple examples of regularities among languages. 

3.2.4. Complex rule systems make a mystery of language learning 

A fundamental problem of generative hnguistics is to discover the form and content 
of the knowledge that a person acqviires when learning a language. The hypothesis that 
this knowledge takes the form of a system of comi)licated rules, complete with information 
about the order in which they must apply <uid about whether they are o))Ugatory or oi)tional, 
makes it hard to understand how a children could cyex learn their native tongues. Stowell 
(1981:64) notes this problem in connection with the hnguistic theories of the sixties: 

The very complexity and vjiriety of the transformational grammars of in- 
dividucd langxiages frustrated attempts to d(weloi) (explanatory theories of 
language acquisition. Although there were some promishig possibilities of 
formal hnguistic universals, most of the com])lexities in sp(;cific grammat- 
ical rules appeared to be tremendously idiosyncratic. This was perhaps 
most obvious for the transformational rules, each of which appeared to 
require an arbitrary collection of ehmientary operations . . . ajid various 
mysterious conditions preventing individual rules from applying in certain 
environments. It was obvious, from the perspective of a reasonable the- 
ory of acquisition, that tliese comj)lexities could not be directly learned 
Oil the basis of experience, since the lenrning task would have to depend 
On explicit negative evidence of a vcuy obsc\ire khid .... On the other 
hand, V(;ry few of the observed conditions could be deduced from known 
properties of the language faculty, leading Chomsky [(1965:46)] to remark 
that "no pr(>sejit-day tlu^ory of langxxage can hope to attain explanatory 
adecpiacy beyond very restrictive donuiins." 

With detailed systems of language-particular rules, there ;u-e just too many details in the 
description of a language for the language learner ever to acquire it. 

3.2.5. An unconstrained framework makes learning impossible 

Lajiguage acquisition requires the le;irner to construct a grammcir on tlie basis of finite 
evidence. Tlu- grammar can apply to an indefinitely large range of sentences not heard 
before. If the language learner is to be successful, the constructed grammar must agree 
with the grammars of others in the speech community. 

The language heartier cannot succeed if armed only with very weak constraints on what 
the structure of the target laiiguage might be like. There are just too many v/ays to project 
Ik^oirI experit^nce. In a sufficiently powerful ilescrii)tive fra,mework, for instance, there 
ar(^ imh^finitely many grammars compatibh? with <uiy finite amount of hnguistic exi)erience. 
The language learner nmst use some principle of universal grammar to choose among them. 
Without such a princi]de, the le;irner may not choose a grajnmar that agrees with those of 
others. 

An imconstrained framework with a weak theory of universal grammar gives the lan- 

9 



riinciplc-Bascd Pursing Engineering 

giuigc leariior almost no giiidiuicc about how to solv(i the problem of iirojectiTig beyond a 
finite range of observed evidence. Langnage learning luider sncli circiimst;mces is impossible. 
More restrictive theories <u:e necessary iti order to (;xi)lain langnage acquisition. 



4. Language descriptions in natural-language systems 

Modern syntactic theories have found a cure for the scientific ills of section 3.2, and 
section 5 will describe it. First, however, this section will establish that the langnage 
descrii)lions that underlie existhig natur;d-ltinguage parsers have many of the same problems 
that beset early syntactic theories. P<irser design could benefit from the same curative 
measures that improved hnguistic theory. 

The grammars that are embodied in most existing parsers consist of complex, language- 
dependent rule systems that expUcitly si)ell out such matters as the orders and types of 
constituents in various constriictions. The practice of naturril-language parsing is thus in 
rougldy the same situation as Cfirly hnguistic theory: each language is described by a large 
set of complicated rules that exhaustively specify the d<^tails of their application. In much 
the same way that com[)licated, language-dependent riile systems fall short of the scientific 
goals of hnguistics, tlicy make it difficult to meet the engineering goal of constructing 
naturcd-languagc systems: 

• Describing grammaticrd phenomena by means of a compk^x, stiprdative rule system 
instead of reducing them to underlying principles leaves unanswered the question 
of why the details of the rule system are the way they are. Withotit })rinciple3 
that exi)lain wliy the details shovdd be one way rather than miother, the system 
designer is just as hkely to get them wrong as right. 

• The choice of an unconstrained rule framework makes weak claims aboiit what 
natural languages are hke. The unrestricted availability of powerful descriptive 
devices gives the system designer the unwanted ability to describe "languages" 
with properties quite unhke those attested in naturiil languages. 

• When the d(^scrii)tions of iudividiud languages are large and coini)lex it is something 
of a mystery how a systeju designer can ever siicceed at building a parser. Surely 
this notoriously difficult task could be easier with a more concise characterization 
of the differences mnong languages. 

• Large grammars can also m<-d<e natural-language systems nui slowly. 

• Like the language learner, the system designer must arrive at a rule system that 
projects beyojjd the example sentences that shaped its desis^ii. The fciilure to seek 
guidiuice from universal grammar leaves the designer without constraints to aid in 
choosing from the myriad possible langtiage d(!Scriptions that will work properly 
on simple examples. 

10 



Principlc-BiiHcd Pursing Engineering 

4.1. Complicated rule systems in existing natural-language pro- 
grams 

Whether it is ;m augmented transition network, an augmented context-free grammar, 
or a set of pattern-action rules, tlie rule system that encodes the hnguistic knowledge of 
a current natural- Imiguage system is hkely to be large, complicated, iuid highly language- 
dependent. A few examples will illustrate. 

4.1.1. Existing parsing rules are complicated 

The language dc^scriptions that underUc existing natural-lajiguage parsc^rs are made up 
of complex ndes that generally spell out the dt^tails of their ^ipplication quite specifically. 
Like Jackendolf's i)hr<-use-structure rules (10), even unadorned context-free rules spell out 
the order, type, and obhgatoriness of constituents in vjirious constructions. Most systems, 
however, spell out much more. 

Robinson (1982:42), for instance, cites the v<u-b-phrase rule shown hi Figure 2 as typical 
of the rules used in a system for interpreting English dialogue. (Not surprisingly given the 
complexity of this rule, transcription errors appear to have affected parenthesis matching in 
the publish(ul version.) ATN-based systems also use detailed tests and actions on grammar 
arcs; see Figure 3. 

Even Mcircus (1980), who constrains the information available to parsing rules, tisea 
some rather complicated tests and rule-packet activations that tell the parser what con- 
stituents to expect and where to attach them. Figure 4 illustrates. Marcus's framework 
also rtupxires the parser designer to notice potential funbiguities in the interpretation of 
surface ciies, writing diagnostic rules to decide between competing possible parser actions. 
A diagnostic rule for a construction might be considered the most detailed rule of idl, since 
it requires the parser design(^r to consider not only the construction at hand, but all other 
constructions tliat might look similar giv(>n tlie limited information available to the parser 
at v.'irious points. Marcus's diagnostic rules also tend to require access to a wider range of 
information than other grammar rules. Figure 5 gives examples. 

4.1.2. Existing rule systems are large 

In addition to being detailed, the description of a language that underhes a typical pars- 
hig system is lengthy. A typical ATN system lias several hundred circs; for instance, Bates 
(1978:238) mentions one with 83 states, 202 axes, and 38G actions. Robinson (1982:27) ex- 
plicitly describes the DIAGRAM augmented phrase-structure grammar as "large ;md complex," 
and the set of verb-i)lirase rules in that system (:45f) seems to bear out that description. 
The rules are shown here in simplified form: 

(15) VP = V (NPl ([NP2 / P])) 
VP = V P (NP) 
VP = V (NP) ("THAT") SDEC 
VP = V (NP) INFINITIVE 
VP = V (NP) [PPL VP / ADJP] 
VP = V (NP) (ING) [VP / BE PRED] 

11 



Principle- Ihisvd J'ansiu^/ Enginccrinfr 



(VPl VP = V (IIPl (1IP2 / P)); 

CONSTRUCTOR (PROG ((PARTICLE (Q DIAMOND. SPELLING P))) 
(COND 
[(Q NPl) 

(OR (® DIROBJ V) 
(F. REJECT (QUOTE F. DIROBJ))) 
(COND 
((« NP2) 
(OR (9 INDIROBJ V) 

(F. REJECT (QUOTE F . INDIROBJ)))) 
((9 P) 
(OR (FMEMB PARTICLE (9 PARTICLE V)) 

(F. REJECT (QUOTE F .PARTICLE) )) 
(AND {Q PRO NPl) 

(^FACTOR (QUOTE F. PARTICLE) 
LIKELY)) 
(COND 

((Q NCOMP NPl) 
(OR (9 NP NCOMP NPl) 

(QFACTOR (QUOTE F. PARTICLE) 

UNLIKELY) 
(AND (Q NCOMP NP NCOMP NPl) 

((3FACT0R (QUOTE F. PARTICLE) 
UNLIKELY)))) 
(T ((3SET BAREV T) 

(9FR0M V DIRECTION DIROBJ)))) 
TRANSLATOR (PROGN [COND 

((9 NP) 
(@SET ROLE (QUOTE DIROBJ) NP2) 
(@SET ROLE (QUOTE INDIROBJ) NPl) 
(9SET SEMANTICS (COMBINE 
(9 SEMANTICS V) 
(9 SEMANTICS NP2) 
(9 SEMANTICS NPl))) 
(T (AND (9 NPl) 

(OR (9 INDIROBJ V) 

(9SET ROLE (QUOTE DIROBJ) NPl)) 
(9SET SEMANTICS (COMBINE 

(9 SEMANTICS V) 

(9 SEMANTICS NPl)))) 



Figure 2: Tliis verb-plirasc rule from tlic DIAGRAM system of Robinson (1982) is complex 
and detailed. 



12 



Pvuiciplc-Biuscd Pnrshig Enghwcring 

(VP/V 

(CAT V (AND (GETF PASTPART) 

(EQUAL (GETR V) (QUOTE BE))) 
(HOLD (QUOTE NP) (GETR SUBJ)) 
(SETRQ SUBJ (NP (PRO SOMEONE))) 
(SETR AGFLAG T) 
(SETR V *) 
(TO VP/V)) 
(CAT V (AND (GETF PASTPART) 

(EQUAL (GETR V) (QUOTE HAVE))) 
(ADDR TNS (QUOTE PERFECT)) 
(SETR V *) 
(TO VP/V)) 
(CAT V (AND (GETF UNTENSED) 
(GETR MODAL) 
(NULLR V)) 
(SETR V *) 
(TO VP/V)) 
(CAT V (AND (GETF PRESPART) 

(EQUAL (GETR V) (QUOTE BE)) 
(ADDR TNS (QUOTE PROGRESSIVE)) 
(SETR V *) 
(TO VP/V)) 
(JUMP VP/HEAD T 

(COND ((OR (GETR MODAL) (GETR NEG)) 

(SETR AUX (BUILDQ ((0 (AUX) + +)) 
MODAL NEG)))))) 



Figure 3: This s!ni])lillcd ATN state foriiis i)<ut of the verb-phras(^ network in a grammar 
described by Bates (1978:208). Like the rule in Figm-c 2, it is conipk-x and detailed. 



VP = V (NP) [WHPP / WHNP / WHADJP] [SDEC / INFINITIVE] 

VP = VP (",") [PP / INFINITIVE / ADVP] 
Marcus's (1980) partner is somewhat smaller; one version has 101 rules, and many of tho^e 
rules pertain to mmd)ers, dates, and other idiosyncratic elements of his parsing apphcation. 
In part, the smalhsr size of Marciis's parser derives from the fact that it is more closely related 
to transformational accoiuits of grammar thiin to accounts that use phrase-structure rules 
to describe surface conhgurations directly. (See Marcus, Chapter 5.) 

4.1.3. Existing rule systems are language-dependent 

The higldy language-de])eiult>nt character of the above-cited systems should be clear 
from the samph^ rules given. Siut^ly the d(>ta.ils of what to expect at various points in a parse 
would change wh(-n gohig from English to a verb-final language, a postpositional language, 
or a language with Jio ambiguity b(>tween i)r(ipositional plirases and infinitives. 

Naturally, miy ride system that expresses knowledge of a particular language must 
change from language to language. The unfortnuate characteristic of existing rule systems 
is not that they differ from language to language, but that t1u>y differ more than the 
language structures do. Existing jiarsers do not seem to be modularized in such a way 
that changing a single language characteristic corresponds to chajiging a sijigle part of the 
language description. Since small changes in underlying parmneters can have large effects 

13 



Principlc-Biiscd Pm-sing Engmccring 



{RULE main-verb PRIORITY: 15. IN PARSE-VP 

[=verb] --> 

Deactivate parse-vp. 

If c is major then activate ss-final else 

if c is sec then active emb-s-final. 

Attach a new vp node to c as vp. 

Attach ist to c as verb. 

Activate cpool. 

If there is a verb of c and it is passive 
then activate passive; run passive next. 

If it is inf-obj tnen 

if it is to-less-inf-obj then activate to-less-inf -comp andthen 
if it is to-be-less-inf -obj then activate to-be-less-inf -comp andthen 
if it is 2-obj -inf-obj then activate 2-obj -inf-comp 
else activate inf-comp; 

if it is subj-less-inf-obj then activate subj-less-inf-comp 
else if it is no-subj then activate no-subj . 

If it is that-obj then activate that-comp. 

If there is a WH-comp and it is not utilized 
then activate WH-vp else 

if the current S is major then activate ss-vp else 

activate embedded-s-vp. } 

(RULE WH-WITH-NP-PP-NEXT PRIORITY: 7 IN WH-VP 

[=npl [=prep] --> 

If the greatest possible number of objects of c is greater than 1 

and a prepositional phrase of 2nd and the WH-comp 
fits a pp slot oi c 

or 

the greatest possible number of objects of c is equal to 1 

and a prepositional phrase of 2nd and the WH-comp 
fits a pp slot of the current s 

then run objects next else 
If the greatest possible number of objects of c is greater than 1 

then run wh-with-np-next next else 
Run too-many-nps next 



:r 



{RULE WH-WITH-PP-NEXT PRIORITY: 5 IN WH-VP 

L=prep] [=np] --> 

If a prepositional phrase of 1st and 2nd fits a pp slot of c 

then run pp next else 
If it isn't true that 

a prepositional phrase of 1st and the WH-comp 

fits a pp slot of c 
then if the greatest possible number of objects of c 
is greater than then run create-wh- trace next 
else run too-many-nps next 
else 
If the lowest possible number of objects of c is greater than 
then run create-wh-trace next else 
run wh-pp-build next.} 



Figure 4: These rules from Marcus (1980) illustrate that the rule-packet structure of the 
parser can be somewhat intricate and the rule actions c;ui be complicated. 



14 



lYmciplc-Bit'^cd PiirsinfT Engineering 



{RULE HAVE-DIAG PRIORITY: 5 IN SS-START 

[=*have, tnsless] [=np] [t] --> 

If 2nd is ns. n3p or 3rd is not verb, or 3rd is tnsless 

then run imperative next else 

run yes-no-q next.} 

(RULE WHICH-DIAGII IN CPOOL 

L=*which; * is not any of auant, relpron] --> 

If the np above c is np, modible then 

label 1st pronoun, relpron, wh 

else label Ist quant, ngstart, ns, npl, wh.} 

{RULE THAT-DIAG-1 IN CPOOL 

[=*that; * is none of comp, det, pronoun] [=np] --> 

If there is not a det of 2nd 

and there is not a qp of 2nd 

and the nbar of 2nd is none of npl, massn 
and 2nd is not not-modifiable 

then attach Ist to 2nd as det; label 1st det, ns 
else if c is a nbar then label ist pronoun, relpron 
else label 1st comp.} 

{RULE THAT-DIAG-3 PRIORITY: 5 IN CPOOL 

L=*that; * is none of pronoun, comp] [=np] 

[**c; the verb of the vp of the current s is that-obi; 

the lowest possible number of objects of the current s 

is equal to 2] --> 
Label Ist comp.} 

Figure 5: In tlie framework of Marcus (1980), diagnostic rules such as these decide be- 
tween diffcront possible parsing actions when the norm<il grammar rules arc not sufficient 
to determine what to do next. 



on the surface distribution of constituents, it is not surprising that parsing Txilea should be 
highly language-dependent when they spell out the details of their surface application. 

Subject-verb agreement provides one example. Suppose an ATN parser checks agree- 
ment by storing grammatical features of the subject in a register and later comparing them 
to features of the verb. If the parser is to be adapted to parse a verb-initial language, in 
addition to rearrmiging arcs it will be necessary to swap the register store and register com- 
parison operations. To take Jiiiother example, the mechanism that Marcus (1980) uses to 
construct noun phrjises relies h(\-ivily on the fact that English noun phrases are determiner- 
initial and h(mce deternjiners will be encountered first in a left-to-right scan. Adapting the 
Marcus parser to a determiner- fluid language coidd require substantial revision. 

4.2. Engineering disadvantages of complicated rule systems 

Many of the scientific disadvantages that afflicted complex, language-specific rule sys- 
tems in Hnguistics translate into engineering disadvimtages that afflict similar rule systems 
in the realm of natur;il-language processing. They help make parser design a difficidt task. 

15 



Prmciplc-Biiscd Piivshifi Enghwcring 

4.2.1. Detailed rules might seem descriptively m;cessary 

Detailed rules may seem necessary to the designer of a natural-language system just as 
they seemed necessmy to early grammari.ms. After all, something nnist account for surface 
complexity. In a parser built on cont(;xt-free rules, for example, it is necessary to have a 
complicated rule system. Context-free rules directly spell out the surface orders and types 
of constitiu^nts in various constructions, so they must reflect surface complexity hi rather 
direct fashion. 

There is mi alternative, however. Modern Hnguistic theory accounts for surface com- 
plexity by invoking the combined operation of several independent systems of principles. 
K parshig were based aiotmd such principles rather th<m exphcit rules, there might be no 
need for detailed rules in describing the "core grammar" of a limguage. 

4.2.2. Complex rule systems are not explanatory 

A hnguistic theory that describes grammaticid phenomena by means of a complex, 
stipulative rule system histcad of reducing them to underlying principles is at a scientific 
disadvantage because it docs not explain why the details are the way they ai-e. This disad- 
vantage applies in the engineerhig domain as well. Without principles to explain why the 
details of rules shoxdd be one way rather than another, the designer can easily get them 
wrong. 

4.2.3. Complex rule systems are too unconstrained 

It is a theorcticid adviuitage for a theory to place strong hrnita on the allowable set of 
rules of grammar, since a theory that places weak hmits says very httle about the nature 
of IcUiguage. Once again, this theoretical advantage translates hito a practical one. It 
would be easy to construct a language description for use in a parser if the grammaticcil 
framework provided so many constraints that the parser designer was left with no choice 
but to write the correct grammar! Correspondingly, it is very difficult to write a grammar 
when the grammaticid framework is completely unconstrained, giving no clues at all about 
the properties of the correct grammar. 

A somewhat frivolous example may help to illustrate the point. In an unconstrained 
parsing system, the grammar writer is given complete freedom to write the grammar accord- 
ing to personal choices. There is nothing to stop the parser designer from writing rules that 
are sensitive, say, to whetlier the number of words processed so far in the input sentence is 
prime. 

Such freedom is an advantage to a programmer who intends to write a prune-number 

generator, but it is a hindrance to the designer of a natur;d-huiguage system. Rules about 

prime numbers are not n(>eded for parsuig any natur;il language, so the major freedom 

granted is the free dom to nuxke mist akes.^ 

■'In a sense, tlu; search for a n'strictivc parser- writing framework is tlins similar in spirit, to the effort within 
computer science to design computer languages that do not allow certain kinds of (irroneoua programs to be 
express<;d. In both cases, there is an attempt to fit the framework rather exactly to the range of problems 
to be solved, .lust as imme-nuniber rules are not needed for describing nat\]raJ-languag<> syntax, programs 
that ap])ly operations to inapjjropriate data types axe not needed for u?eful progranmiing applications. 

16 



Prmciplc-Biiscd raising Euginvcriug 

In reality there is little dangc^r that the partner designer will accidentally write into the 
grammar a dependence on j)rime nnmhers, bnt there is a danger that the designer will write 
in conditions that are minatural in other ways. The more constraints a theory can offer, 
the more guidance it offers the grammar writer; the more constraints the better, so long as 
the constraints do not rule out the correct grammar for the langnage at hand. A tightly 
constraining theory of grammar makt^ the grammar writer's task easier. 

A restrictive theory of nniversal grammar can also (>xpand the av<iilable range of parser 
implementation options. The more specific the restrictions on grammars, the greater the 
probability that special properties of grammars may Jillow them to be efficiently processed 
or perspicuously hnplemented. To take an example outside the domain of natural language, 
finite-state automata can be simulated more sitnply if they are known to be deterministic 
than if they niciy be nondeterministic. 

4.2.4. Complex rule systems make describing particular languages difficult 

A hnguistic theory that hypothesizes large systems of language-specific rules as the 
basis for the native speaker's knowledge of language is at a scientific disadvfintage because 
it cannot account for the ease with which children acquire their languages. The description 
of a language takes too mmiy details. 

This disadvantage also operates in the engineering domiiin. The difficulty of writing 
a language description for a parser can be expected to grow as the description gets larger. 
The parser designer cannot easily understand an ATN system with hundreds of states and 
thousands of arcs. Just as concise characterizations of the syntactic parameters along which 
languages differ make it possible to approach the goid of explaining language acquisition, 
they can mfdce it easier for the parser designer to specify the differences timong French, 
Itahan, and Wcvrlpiri. 

4.2.5. Complex rule systems can slow down parsers 

The size of the underlying rule system figures in the running time of many parsing 
algorithms. Eculey's (1970) algorithm for parsing context-free grammars, for example, can 
quadruple its running time when grammar size is doubled.^ 

4.2.6. An unconstrained framework makes system extension diflicult 

Explaining how a language can be acquired on the basis of finite hnguistic experience is 

a major theoreticd gojil of hnguistics. The language learner cannot succetnl given only weak 

constraints on what the structure of tlie target language could be hke. In mi imconstrained 

framework, an indeffnitely large number of grammars will be compatible with any finite 

amount of Hnguistic experience. Few of these grammars wiU yield appropriate results when 

appHed to sente nces n ot heard befor e. 

^ThiH argiica against Fodor'a (1983) claim that modular systems of gramuuir lead to less elficicnt parsers. 
Fodor claims that a parH<>r ba.s<;d on a modular theory will be at a disadvantage because it must access and 
integrate information from more than one source. On the one hand the possibility of limited p;u-allelism 
ciui vitiate that objection, while on the other hand the combinatorial effects involved in a non-modular 
system can increase its size enough to make it run more slowly rather than faster. 

17 



Prm ciplc- Bused Pursing Mo dnhir th caries 

The designor of a natural-language system faces a problc^m that in a few respects 
is similar to that of the laugiiage lejuner. Any language description that the designer 
constructs will project in some way beyond the exami)le sentences that shaped its design. 
In an nnconstrained framework, the designer can choose from a nmltitude of systems that 
will work properly on the examples that have been considered at a given point. Only a 
few, however, will fdso apply properly to complex examples. An unconstrained framework 
gives the designer no help at mcxldng a fehcitous choice. It treats more or less eqiuxlly the 
different possible ways of projecting beyond the examples considered so fai. 

The idtimate possibihty of explaining language acquisition shows how far a restrictive 
theory of universal grammar could in principh; go toward making the task of language de- 
scription easier. With language acquisition well-understood, a mechanical algorithm might 
be implenuuited that could acquire the syntax of a natural language through exposure to 
its sentences. The task of writing a syntactic description of the target language would then 
be trivicdized. 



5. The shift to modular theories of grammar 

Sections 3.2 iind 4.2 have shown that language descriptions made up of large systems of 
detailed rules have both scientific cuid engineeruig disadviuitages. Modern hngxiistic theory 
has cured those scientific ills by shifting from the study of complex rule systems to the study 
of modular subsystems of grammatical principles and parameters. 

Rides still exist, but the rule systems are increasingly regarded as simple and impover- 
ished. No longer does each rule meticulously spell out the details of its application; rather, 
the conditicms of proper rule application are determined by general principles that constrain 
hnguistic representations. Majiy of the principles are iiniversal mid hence are not stated in 
the descriptions of particular languages. 

The new modular theories of grammar solve many of the problems that were associated 
with earlier theories: 



• 



• 



• 



They provide better explanations of mjmy grammaticiil phenomena by rediicing 
them to a small set of principles rather than a complicated, stipulative rule system. 

They idh)w universal grammar to place strong constraints on the possible range of 
"core syntactic rules" since they do not rc^quire the details of nde application to 
be stated in the rules themselves. 

They reduce tlie mystery of language acquisition by condensing the basic syntactic 
description of an individuid language down to a set of vidues for a small Hst of 
pjircuneters. Grammars are no longer huge and complicated. 

• The strong constraints that they place on possible grammars simplify the language 
learner's problem of choosing fr<mi the possible grammars compatible with observed 
sentences. The nund)er of possible grammars is no longer itstronoraical. 

18 



rrhiciplc-Biisvd Pursing Modular theories 

5.1. The scientific benefits of modular theories 

The surface beliavior of a system that is composed of interacting components usually 
presents a bewildering array of complexity. When such complexity shows up in the theory 
of a system as well, it is often a symptom that theory has not yet penetrated to the trtie 
tmderlying principles that govern system operation. A theory that needs epicycles upon 
epicycles may be describing derivative effects rather thmi fundamental laws. 

Recent hnguistic theories attempt to understand the apparently complex properties 
of various constructions ;i3 arising from interaction among different principles mid gram- 
matical sid)systems. When such modular theories are possible, they can be expected to 
hcive scientific advantages. Through the process of untangling separate effects they Jire able 
to reduce to simpler principles many of the comphcated stipulations that would otherwise 
seem necessary. 

5.1.1. Modularity yields brevity and simplicity 

According to a modular theory of grammar, surface hnguistic phenomena result from 
the uvteraction of ind(jpendent subsystems of grammatical rules and principles; the compo- 
nents of different subsystems typically hiive different functions and properties. According 
to a non-modular theory, surface hnguistic phenomena rt!sult from the operation of a single, 
unitary rule system; grammatical rules arc of the same type throughout. 

When the phenomena at hand admit a modular description, a modular theory will be 
simpler tluui a non-modular theory. If it is possible to describe the phenomena in terms of 
s(;pcirate subsystems acting independently, then a modular theory can simply describe the 
separate subsystems. A non-modular theory, however, must describe tlu^ combined surface 
effects because it refuses to untangle the separate tmderlying factors. 

It is easy to find examples of how the choice of a non-modular theory over a modular 
one can cause grammar cxpcUision. For example, the grammar gets larger when phenomena 
such cis subject/verb agreement <uid apparent movement of displaced constituents are han- 
dled in the scime rule system that defines the basic constituent structure of the language. 
Consider a non-modular grammar that handles subject/verb agreement by nmltiplyuig the 
number of rules and nonterminals in the grammar, using such rules as S => NP.,g Wsg and 
S ==> NPp/ VPp/. Such a grammar will be larger than a modular grammar that treats sub- 
ject/verb agreement by superimposing agreement rules on a simpler grammar that ignores 
agreement. 

5.1.2. Modularity yields tighter constraint 

Since a modular theory sepeirates subsystems that have different properties, a modular 
theory can <iJso nuike stronger claims than a non-modular one. Suppose a modular theory 
postulates subsystems of ndes of types A and D, while a non-modular theory uscb only a 
single rule type C. Rules of type A and type B nmst have sornewliat different properties, or 
there is no reastm to plact^ them in differcvnt subsystems. It is unavoidable, then, that the 
statements that can be made about rules of type C must be weaker thiui the statements 
that can be made about rules of tyi)es A .ind D. Certain generalizations are necessiirily lost 

19 



Principlc-BiUHcd Piirshig Mochihir theories 

in going to rules of type C, since two mechanisms with different properties liave been made 
to look alik(- the properties that distinguish rules of type A from typ(! D cannot be true of 
cill ndes of type C. As Jisnal, increased generality yields wc^akened constraint. 

Ecu-ly transformational grammars, for instance, used the single mechanism of transfor- 
mations to dt^scribe both the reference of pronouns mid the displaced position of w/i-words 
in questions. More recent theories assign the treatments of these phenomena to separate 
grajnmaticjil components. A transformation handh's ?/;/i-movement, but hitcrpretive rules 
handle pronominal reference. Chomsky (1976J notes that once transformations and hitcr- 
pretive rules are separated, they can be seen to have different properties and obey different 
sets of constraints. It is possible to tighten the range of possible transformations as well as 
the range of possible interpretive rules. 

5.2. Factoring constraints out of grammatical rules 

Factoring genertil constrmnts out of syntactic rules simplifies grammatical theory be- 
cause constraints do not have to be repeatedly stated in the conditions of rules to which 
they apply. The simphfication of uidividual rules ;dso reduces the number of rules needed 
because m.uiy rules that were distinct in earher theories turn out to be the same when 
cluttering details are removed. 

5.2.1. Transformations have been reduced to simple forms 

Generative grammarians have long sought to discovi^r the restrictive set of conditions 
on possible rules of grammar that .dlows the language h^iurner to converge on the .correct 
grammar based on hmited evidence. The shift from rules to principles has its historical 
roots hi the quest to reduce the possible variety of transformations. 

Chomsky (197G) proposed to unpose on the structural descri])tions of transformations 
a condition that would restrict the use of categori;d symbols such <is NP. An SD would not 
be allowed to mention two successive categorial symbols unless one or the other represented 
a constitu(!nt changed by the rule. In pcirticular, the following detailed SD for Passive would 
be ruled out: 

(16) X NP Aux V NP by A X 
(Here A is a dummy maxker.) 

Under Chomsky's proposed restriction and some additional assumptions, the SDs for 
the main operations involved in the derivation of passive sentences would have a simpler 
form instead: 

(17) X NP X NP X 

This hne of Jirgument eventucilly led to a very general formulation of the movement rule 
involved: 

(18) Move NP 

The movement involved in P^issive was thus seen <is one m.uiifestation of a rule that says 
"move any NP anywhere" ratlier th;ui the result of a rule with a detailed context of appli- 
cation. 

20 



Prhiciplc-lhuscd Parsing Moduhr tlwor'ws 

When traditioufil transformational rules were siniplificMl in ho drastic a fcushion, nimiy 
that had previously been considered distinct collapsed into one. For example, the traditional 
rule of Raising to Sxibject came out as just another manifestation of Move NP: 

(19) (i) [np e ] seems the bear to be hungry 

(ii) the bear seems e to be hungry 

Another whole collection of grammatical processes came out as instances of another simple 
rule called Move-w/i. 

In the new theories, traditionally distinct grammatical processes were thus regarded as 
formfilly identicid. They no longer corresponded to the operation of separate rules: 

The notions "passive," "relativization," can be reconstructed as processes 
of a more generid nature, with a functioned role in grammar, but they are 
not "rules of grammar." (Chomsky, 1981:7) 

It was clear that the complexity of the transformational component would be reduced if 
transformations had the simple and general character illustrated in (18) rather thaji the 
detailed, one-rule-per-process character of the old rules such as (5). 

5.2.2. Constraints rule out misapplication 

As Chomsky reahzed, however, rules such as Move NP overgenerate massively unless 
restricted hi some Wciy. Consider this "derivation," for instance: 

(20) (i) John saw Bill . . 

(ii) Bill saw e 

Why Ccui't Move NP turn (20i) into (20ii)? If it were to turn out that some ad hoc condition 
would be required in order to prevent such derivations, there would be no advantage to 
"simplifying" rules down to minimal form. Complexity would simply be shifted from one 
part of the grammar to another. 

No ad hoc conditions are required, however. Most of the "bad" movements are ruled out 
by conditions that have indej)endcnt justification. For example, the above movement is ruled 
out independently by several different general conditions in modern Hnguistic theory. One of 
the simplest is the principle of recoverability of deletion,^ which among other consequences 
forbids a rule from moving a constituent into a position that fdready has another constituent 
in it. In the above case, recoverability of deletion forbids moving Dill on top of John. 

Other misapplications of Move NP are ruled out by other independently motivated 

principles of grammar. Modern theories factor out general constrmnts, mmntain simple 

fornndations of transformational rules, <xnd thus achieve two simplifications. A grammar is 

simplifif^i wlien a general condition is statcul once rather tlum many times in numy niles, 

and it is shrunk when the removal of detaihnl specifications from rules catises previously 

distinct niles to fall together. 

*>Thc lecovcr.ibility principle it.^olf was once .statctl in individual rules rather than factored out as a separate 
constraint. Sec Lasnik (197G:3). 

21 



Principle-Bused Pacsiufr Mo(hihir theories 

5.2.3. GB-thcory uses modular subsystems of principles 

Modem transformational theory goes by the name of government-binding theory, or 
CB-theory, because the technical notions of government and binding play a central role. 
Current GB-thcory^ posttilatea four grammatically significant levels of description. The 
level of D-structure expresses the assignment of 0-roles such as Agcnt-of-Action to appro- 
priate constituents. A D-structure position may not exist unless "licensed" in one of a few 
ways. D-structurc configurations are also constrained by X-bar theory, which is concerned 
with the structural relationships between the "head" of a phreise and its vm-ious satellites. 

D-structure is converted to S-structxire throiigh the operation of rules of the form 
Move a, where a is a constituent. (Move NP is one subcase.) Movement leaves behind an 
empty trace associated with the moved constituent. S-structure is essentially an enriched 
version of ordinary surface structure. S-structure representations are mapped indepen- 
dently to representations in the LF (logical form^) and PF (phonetic form) components. 
As currently conceived, the level of LF functions largely to indicate the scope of qxiantifiers 
and similar elements. Various conditions restrict the relationship between a quantifier and 
its bound vai-iables at LF. The Empty Category Principle also places requirements on the 
distribution of empty categories at LF. 

The 0-critcrion applies at all finguistic levels and requires (roughly) that each noxm 
phrase be associated with one mid only one 0-iole. Since the chain formed by a moved 
constituent and its traces is fissigned 6-xole as a unit, the <?-criterion acts as one constraint 
on movement. The Projection Principle requires representations at various levels to be 
fundamentally just projections of lexical items, in the sense that the properties of lexical 
items (such as whether or not a verb is transitive) miist be represented at each hnguistic 
level. 

Chomsky (1981) briefly describes several subsystems of principles in rui introductory 
passage: 

The subsystems of principles include [bounding theory, govermcnt theory, 
^-theory, binding theory. Case theory, and control theory]. Bounding the- 
ory poses locaHty conditions on certain processes and related items. The 
central notion of government theory is the relation between the head of a 
construction and categories dependent on it. <?-theory is concerned with 
the iissignment of thematic roles such as agent-of-action, etc. (henceforth: 
^-rolcs). Bindiiig theory is concerned with relations of anaphors [refcren- 
tially dependent elements such us "each other" cuid NP-trace], pronoims, 
names, and variables to possible anteced(>nts. Case theory de^ds with 
{issifj^nment of abstract Case and its morphological reafization. Control 
■^This grocitly coiKlen.scd aunim.'iry is based ou Chomsky (1981) iuid on Chomsky's Fall 1983 class Icctxircs. 

^ "Logical form" is used as a technical term within GB-thcory. In this context, the ordinary mo.-uiing of 
the term is only suggestive ;uul ciin be njislcailing. Representations at the LF level do not carry all of 
the information that is releviuit to logical form in other senses. For example, the LF representation of a 
sentence according to GB-theory is not directly relevaiit to determining the logical validity of inferences 
that might be drawn from the sentence; sioiilarly, the occurr(>nce of a quantified variable at the LF level 
carries no ontological commitment, desjjite the famous dictum that to be is to be the value of a bound 
variable. Sec Chomsky (1981:17). 

22 



Prmciplv-Ihuscd PHrsing Moduhir theories 

tlieory (lotonniucs tlio potcntiiil for reference of the abstract pronominal 
clement PRO [which is the snbject of the infinitive in a sentence such as 
"I hke to watch TV"]. (:5f) 

[Bjinding mid Case theory can be developed within tlic framework of gov- 
ernment theory, and . . . Case and (?-theory are closely uiterconnected. 
Certain notions, such as c-command, seem to be central to several of 
these theories. Fm-thermore, [these subsystems] interact: e.g., bounding 
theory holds of the rule Move-« (ie., of antecedent-trace relations) but 
not of other mitecculent-anaphor relations of binding and control theory. 
Each of [the subsystems] is based on principles with certain possibilities 
of parametric variation. Through the interaction of these systems, many 

properties of i)articuhu: languages can be accounted for Ideally, we 

hope to find that complexes of properties differentiating otherwise simi- 
lar languages are reducible to a single parameter, fixed in one or another 
way .... (:6) 
Obviously, this is not a complete hitroduction to CB-theory, but it should suggest tlie nature 
of the constraining principles that are involved in GB-theory and its vai-iants. 

5.2.4. A detailed Passive rule is no longer necessary 

As an example, consider the detailed Passive rule (5). In modern terms, the old rule is 
not a separate rule of grammar, but merely one subcase of Move NP. The details mentioned 
in the old rule result from tlie interaction of various principles. 

Passivization can't apply with active verbs because an active verb assigns a <?-rolc to 
its subject. The <?-criterion forbids a position that receives a ^-role from being empty at 
D-structure, so recovcrability of deletion wiU prevent the object from moving into subject 
position. 

Passivization must leave an empty trace behind because all jnovement rules do. It isn't 
necessary to stipulate that fact as a property of Move NP. (If there were no trace, a moved 
NP would lose its <?-role iuid violate the (^-criterion.) 

Passivization is obHgatory with passive verbs because of a principle of Ciise Theory 
that requires a noun phriise to have some case such as nominative or objective assigned to 
it. According to inodcrn theory, passive participles do not assign case. The noun i)hrase in 
object position must move to a case- assigning position. 

Vfirious other details Jilso follow. The copula be is required with passive verb phrases 
because they are thought to have the categorial status not of an ordinjiry verb phrase, but 
of a neutriilized category intc^rmediate between verb phrase cuid adjective. Subjacency, the 
major principle of bounding theory, rules out some other improper movements. 

5.3. Factoring constraints out of grammars 

Tlie shift from detailed rules to systems of priiiciples has also strengthened the theory 
of universal grammar. In addition to factoring con.strjiints out of individuiil rules, syntactic 
theories can factor some constraints out of grammars entirely. Many constraints are thoiight 

23 



rrmciplc-lhiscd Parsiiig Modular theories 

to hold for ;dl natural languages and hence do not n{;cd to be stated in the descriptions of 
individual! languages such as English and French. 

When the properties of universal grammar have been factored out, the specification 
of the "core syntactic structure" of a language amounts to no more than a selection of 
particulcu viilues for parcuneters from a small list: 

Universal grammar will provide a finite set of parameters, each with a 
finite number of vfdues, apart from the trivifd matter of the morpheme or 
word list, which must siirely be lejirned by direct exposure for the most 
part. (Chomsky, 1981:11) 

These parimiet(;r settings interact with various principles to yield the language-particular ef- 
fects that were attribtited in earher theories to the operation of detailed language-particular 
rules: 

Lcmguages may select from among the devices of universal grammar, set- 
ting the parmneters in one or another way, to provide for such general 
processes as those tliat were considered to be specific rules in earher work. 
At the same time, phenomena that appear to be rehited may prove to 
arise from the interaction of several components, some shared, accounting 
for the similarity. The full range of properties of some construction may 
often result from interaction of several components, its apparent complex- 
ity reducible to simple principles of separate subsystems. This modular 
character of grammar will be repeatedly illustrated (:7) 

In effect, recent theories can derive from deeper principles many syntactic facts that were 
merely written down (in the form of rules) in previous theories. 

In hnguistics, a theory of universal grammar that allows for only hmited, parametric 
Viiriation in basic structure from one language to another has three major advantages over 
one that billows a wide variety of complex, language-specific rule systems. It is preferred 
because of three major advantages. First, it makes stronger claims about the nature of 
naturd languages, hence is preferred (if true) on general scientific grounds. Second, it 
limits the ajnount of information that is needed to characterize the structure of a language, 
hence can help make it possible to explain how a language can be acquired by children on 
the basis of fimited evidence. Third, hi cases where it can derive details of "rules" from 
general principles, it provides a better explanation for those details than a theory that 
simply writes them down. 

5.4. Correcting the deficiencies of complex rule systems 

§§5.2 iuid 5.3 suggest that by imtanglhig the effects of sepai-ate underlying subsystems 
of grammar, modern theories of grammar liave come closer to uncovering the principles 
that form the true basis of syntactic knowledge. The new theories postulate simple, re- 
stricted rules instead of complicated ones drawn from an unrestricted frjunework. They 
view syntactic variation from language to language as characterized by a snudl number of 
param(>t(ns rather than a larg(? body of detailed rules. Where possible, they have factored 
out general conditions both from rules ;iaul from language descriptions. 

24 



Prmciplc-BiLscd Piusing The rcsaircb pTogram 

The (lov(>l()pin<Mit of iiiochikir theories has cured mfuiy of the scientific ills of earlier 
theories of grammar. Fundamental principles support bett'er explanations than stipulative 
rides. Separating diff(u-ent grammatical components from one another allows stronger con- 
straints and more sweeping simplifications within each component. The theory of limited 
parfunetric variation condenses language descriptions to a small size and makes language 
acquisition seem possible. 

6. Replicating the shift to modular syntax 

Old-style grammatical theories and current natural-kuiguage parsers both use compli- 
catiid rule systems to describe the syntactic structures of languages. Such rule systems 
have been superstuled in grammatic<d theory because of scientific shortcomings, tmd for 
corresponding reasons they cause difficulties in parser design as well. 

The cure for these engineering mdadies should be the same in parser design as it was 
hi hnguistic theory. The shift to modular theories of synt;ix should be rephcated in parsing 
practice. Such a shift would make it easier to design natural-hmguage systems because it 
will shorten and simphfy the necessary underlying descriptions of particular languages. 

6.1. Rules and principles in parsing 

A successful branch of Hnguistic theory has largely ab<mdoned comphcated, language- 
specific rule systems in favor of simpler subsystems of principles that can account for numy 
of the same facts. Given the engineering disadvmitages of old-style rule systems, why hasn't 
parser design already followed suit? The answer fies partly in the fact that there are no well- 
nnderstood ways of using the new modular hnguistic theories concretely in the processing 
of sentences. 

It is fairly clear how to embed a context-free grammar in a parser; many parsing 
methods for such grammars have been developed. More generally, it is often easy to imagine 
many ways to base a parser on a system of rules that is exphcit about such matters as the 
surface order and composition of the constituents of viurious constructions. In majiy cases 
the rules can be put to use in relatively direct fashion for the recovery of syntactic structure. 

For example, <ui SLR{0) parser (Aho and Ullman, 1977) can be said to use context-free 
grammar rules rather directly because it operates by simply tracing through tlie grammar 
rules, placing dots in the rules to indicate its position. Since a context-free grammar 
exi)hcitly spells out the order mid type of constituents in various constructions, it is a 
simple matter to keep track of what is (jxpected next: 

• If the item A => B.aC is currently one of the possible descriptions of progress so 
ftu: through the input, this means that a phrjise of type A is expected and its first 
constituent, a phriuse of type D, has already been processed. An a can be expected 
next; if the next input symbol that is read is indeed an a, the item is advanced to 
read A ==> Ba.G. 

• When an item with a dot at the end becomes current, it means that the end of jm 
expected constituent has been reached; if A => DaC. is a current item, tlien any 

25 



Prhiciplc-Biu^cd Parsing The rcsairch progrnm 

itoin of the form P => Q.AR that was current before the yl-phrase Wcis sought 
shouhl now be advanced to read P => QA.R. 
• Wh(;n an item with a dot before a phriuse symbol such as A becomes current, it 
means that a phrase of the hidicated type is expected next. The rnks exprniding 
that phrase type are consulted to start the parser off on its path through the 
exp(;cted phrase; when P =■> Q.AR was first current, the expected yl-phrase 
woidd have been sought through activation of the item A => .DaC. 
Although I have suppressed many details hi this description of SLR{0) parser operation, it 
should be clear that the SLR{i)) parsing method makes direct use of the information about 
constituent order and constituent type that is spelled out in context-free grammar rtdea. 

It is less clear how to imidement a parser for a hnguistic theory in which constituent 
order mid constituent type in a construction arc not exphcitly spelled out, but follow in- 
stead from the interaction of Vcirious general principles and requirements. Such a theory 
does not directly say what vm-ious constructions look hkc on the surface; indeed, it would 
be redundant for the theory to do so, since it derives surface characteristics from other 
principles. As a consequence, it is more difficult to see how the parser can bridge tlie gap 
between structure and surface appearance.^ Few implementation models are known for 
the new modular, principle-based theories of grammar. The models that do exist use the 
principles of grammar only indirectly. 

Berwick and Weinberg (1984), for instance, point out that the Marcus parser can be 
considered an implementation of a recent hnguistic theory because it uses similar represen- 
tations and mimics similar constraints. However, the rules and organization of the Marcus 
pcirser do not correspond directly to those i)roposed by theorists. "Metarule" systems such 
CIS that of Gazdar (1981) also can also be used to implement new-style transformational 
theories. ^° IlowfJver, the function of metarules is the precomputation of a large set of ordi- 
nary context-free rules. It is the context-free rules rather thmi the underlying grammatical 
principles that are then put to use hi sentence processing. A metarule hnplementation of a 
new-style theory destroys its modular character by multiplyhig out the surface consequences 
of its various components. The context-free "object grammar" that results from applying 
metarules to a context-free base can be quite huge — containing "literally trilhons of rules," 
in the words of Shieber (1983:4). 

6.2. A research proposal 

Computational hnguistics should fill this gap in our understanding of how to put hn- 
guistic knowledge to use hi sentence processing. Researchers shonld rephcate in parsing 
practice the shift to modular, principle-based theories of syntax. According to recent Hn- 
guistic theory, complicated, language-specific rule systems do not form an important part 
of a person's syntactic knowledge. Perhaps, then, such systems need not form the basis for 
the recovery of syntactic structure. 

The resem-ch program that is proposed here seeks to discover how to base a parser on 

'^What.evcr effect tliat fact ha.s on tlio difficulty of parser design, however, it does not imply that parsing 
will be "less efficient" than with a surface-oriented ayatein. That (jucstion could come out either way. 

'°Thia is not the interpretation that proponents of metarule systems intend. 

26 



Principlc-Biiscd Parsing The research program 

interacting principles and parameters rather tli.'ui on rules that individually stij)ulat(; the 
details of their operation. It shoidd be possible to nse raiiny of the principles "directly" 
in parsing, without preconipiiting their effects. If parser operation were leased on linguis- 
tic principles rather th;m large sets of stipulations, results from universal grammar could 
shorten and simplify the language descriptions used in natural-language parsing. Piirser 
design as well as hnguistic theory would be able to view syntactic variation from language 
to language as characterized by a small mimber of parameters rather than a large body of 
detailed rules. The notion of relatively direct rcahzation of a grammatical theory could also 
be clarified. 

More concretely, a parser could use principles rather th^m stip\dative rules to detect 
the site of NP-movemcnt in a passive sentence. It would insert a trace after the passive 
participle not because the grammar writer had written a language-particular rule that 
exphcitly dircctc^d it to do so, but because it in some way directly respected the principles 
of case and ^-role assignment that force the conclusion that the post-participial position 
must have been a movement site. 

6.3. Characterizing the proposed research program 

The research program that is proposed here should draw on methods and results in 
several intellectual disciplines: 

• It is a problem in applied computer science to investigate appropriate implementa- 
tions of hnguistic theories. In computer science, an abstract object is characterized 
by the set of operations defined on it. A descripti<m of the representations, princi- 
ples, and rules that a linguistic theory postidatcs can serve as the specification for 
a family of cibstract objects that help implement a parsing model for the theory. 

• It is a problem in applied linguistic theory to take an account of the speaker's 
knowledge of language and put it to use in recovering the syntactic structiire of 
sentences. 

• It is a problem hi engineering to try to improve the perfonucince of natural-language 
processing systems. A parser that is based on an ciccurate imd explanatory theory of 
linguistic structure has a better chance of accurately recovering that structure than 
a parser that is based on a large set of complicated rules. (There is no advantage, 
however, unless the implementation is both faitliful mid computationally practictil.) 

• It is a traditional goal in artificial intelligence to work toward systems that can 
learn rules instead of having them cdl built in. Learnability is cui expHcit concern 
in modern generative hnguistics, and there is more hoi)e of setting from experience 
the valu(\s of tightly constrained hnguistic j)arcuneters than of inductively building 
up a complex set of rules. 

• It expmids the realm o^ parsing theory to propose new parsing algorithms. As noted, 
most current parsers are driven by sets of rules that directly specify constituent 
order. In contrast, the proposed new parser is to be driven by sets of principles 
that indirectly determine constituent order. 

• It is of interest in cognitive psychology to i)ropose nev/ models of how knowledge of 

27 



Principlc-Dnsvd PuTshifr The research progrcim 

language can be put to use in sentence processing. Each Jiew parsing algorithm is 
potentially a new candidate for a model of how humans process language. 

• It is of some interest in linguistics to discover in what respects a theory does and 
does not suffice to determine the structure of sentences. A parser implementation 
that actually processes sentences cannot help but shed hght on this question. 

The entire project amounts to taking modern linguistic theory seriously, towiird a variety 
of ends. 

6.4. Encouraging anecdotes 

Tlie ultimate success of this Vine of res(>arch cannot be predicted aJiead of time. How- 
ever, there is anecdotal evidence that the effort to factor out underlying principles instead 
of describing their surface effects can in fact yield engineering benefits as expected. Small 
is b(?autiful when it conies to the amount of hiformation a parser designer must specify 
in order to parse a new language, and these examples show that a shift toward modular 
organization can indeed decrease the size and complexity of a parsing system. When in- 
dependent but interacting underlying principles arc involved, a rule system that multiplies 
out their surface effects is clumsy. ^^ 

6.4.1. Factoring out Aux-Inversion makes Marcus's question rules simple 

Redundancy in a rule system often signals that the process of factoring surface ap- 
pearances into underlying principles is not complete. The ability of modular factoring to 
reduce redundancy is illustrated nicely by the contrast between Robinson's and Marcus's 
treatments of yes/no qtiestions. Consider the rules for yes/no questions in the DIAGRAM 
parsing system of Robinson (1982): 

(21) Sq = BEP NP ((ADV) (ING "BE") PRED) 

SQ = MODALP NP (ADV) PPL "BE" ((ING "BE") PRED) 

SQ = HAVEP NP (ADV) PPL "BE" ((ING "BE") PRED) 

sq = DOP NP (ADV) VP 

sq = MODALP NP (ADV) (HAVEP PPL) (BEP ING) VP 

sq = BEP NP (NOT) ING VP 

sq = BEP "THERE" (NP) ([ING ["BE" PREDl / VP] / PRED2 / SREL] ) 

sq = (MODALP "THERE" (NOT) (HAVEP PPL) BEP 

(NP) ([ING ["BE" PREDl / VP] / PRED2 / SREL]) 
sq = HAVEP "THERE" PPL "BE" (NP) 

([ING ["BE" PREDl / VP] / PRED2 / SREL]) 

There is much redundancy in the statement of question rules themselves, and there is even 

more when a few of the completely sci)<iratc rules involved in the corresponding declaratives 

;ire co nsidere d: 

^^Thc complexity that results from using a non-mochilar, surface-oriented framework reaches a striking 
extreme in the parsing system described by Sager (.1981). Based on Zellig Harris's structuralist string- 
analysis framework and twenty years in the making, the system u.scs a plethora of rules and category types. 
It iisea at least 100 expansions for the supposed category "object." 

28 



rrhiciple-Bnscd i'.ir.sijif? The rcsairch progmm 

(22) SDEC = "THERE* (AUX) BEP NP ([IWG [VP / "BE" PREDl] / 

PRED2 / SREL]) 
SDEC = NP (AUX) (ADVl) BEP (ADV2) (PRED) 
AUX = (MODALP) (HAVEP PPL) (BEP ING) 
SDEC = NP (ADV) (AUXD) VP 
AUXD = [AUX / DOP] 

Robinson (1982:32) scema aware of some aspects of the redundancy problem: 

[W]e feel that there is sonu; loss of generality in writing so mtuiy sepa- 
rate rules that have so m<'my ehuncnts in common, and we are therefore 
exploring the possibility of deriving some rules from other rules. 

In contrast, Marcus (1980) is not constrmned by his rule frimiework to spd\ out surface 
configurations, and he is able to capture many of the coiisequences of Robinson's comi)lex 
rule set for questions by supplementing his independently required rules for actives with 
two simple parsing rules that are related to the traditional transformation of A ux- Inversion: 

(23) {rule YES-NO-Q in ss-start 

[=auxverb] [=np] --> 

Label c s, quest, ynquest, major. 

Deactivate ss-start. Activate parse-subj.} 

{rule AUX-INVERSION in parse-subj 
[=auxverb] [=np] — > 
Attach 2nd to c as np. 
Deactivate parse-subj. Activate parse-aux.} 

In the Marcus parser, as in transformational grcimmar, surface constituent ordc^r in yes/no 
questions is taken to be derivative. The surface order is not to be spelled out directly 
with non-modular rules hke those in Robhison's grammar. Rather, the word order of 
yes/no questions results when the processes that determine word order in declaratives are 
perturbed by a separate process that Marcus factors out as a simple pair of rules. 

6.4.2. Separating syntax from semantics also simplifies rule systems 

The complexity of a parsing system can also be reduced when knov/ledge of linguistic 
form and knowledge of the world are sepjirated into different modules. On the theoretical 
side, Grimshaw (1979) shov/cd that mixing syntactic and semantic requirements led to a 
larger overall description of the relationship between verbs cuid their complements. On the 
practical side, mixhig the semantic component of a program into its syntactic rule system 
will typically make it necessary to duplicate the same semantic tests and actioiis in tlie rules 
that pai-se Jill the syntactically distinct ways of expressing roughly the same idea. Robinson 
(1982) mentioiis the undesirability of duphcating a single semantic action in several different 
grammar rules, and the expcriciice of Bates with semantico-syntactic mid purely syntactic 
ATN systems also seems to support this point: 

Semmitic grammars tend to be nmch larger than syntactic grammars 
which accept the same set of sentences. The largest ATN grammar this 

29 



Prhiciplc-Diiscd Pursiuf; Design chnractcrhtics 

aiithor knows of is one she wrote for the BBN speech iinderatandhig sys- 
tem . . .; it contained 448 states, 881 arcs, cuul 2280 actions but was more 
Hniited in the variety of constriictions it conhl accept than an 83 state, 202 
arc, 386 action syntactic grammar for the same system. (Bates, 1978:238) 

Another practical advantage of keeping syntax separate from semantics is that - once 
again - it saves tlie parser designer from the task of explicitly working out the interactions 
and writing them into the rules. Bates continues: 

Another drawback to a semantic grammar is that it must be written anew 
if the donifiin of discourse is changed, and it would be extremely im- 
practical to attempt to write such a grammar for anything but a hmitcd 
application fu'ca. (:238) 

Evidently a modular approach leads to a system that is both less bulky and easier to modify. 

7. Some possible characteristics of the proposed parser 

This section suggests some tentative choices for the design of the parser. Some choices 
must be made in order to preserve the benefits of modular syntactic theories. Others are 
not forced and thus represent only one point in the spectrum of possible research strategies. 

7.1. Parsing under the control of principles of grammar 

The general function of a parser for a grammar G is to "understand sentences in the 
manner of G."^' The parser carries oiit this function by assigning structural descriptions 
"und(^r the control" of princii)les of grammar. This characterization of parser operation 
leaves open a spectrum of options for the degree of directness of "control" by the gram- 
mar. At one extreme of the spectrum, control by the grammar might amount only to the 
imposition of an input/output constrfiint.^^ The operation of the parser would then be de- 
termined largely by performance principles distinct from rules of grammar. At the opposite 
extreme, the grammar might force each internal step of i)arser operation. The contribution 
of performance principles would then be much smaller. 

The spectrum of directness of control corresponds to a related spectrum of directness of 
constraint on internal rc^presentations. At one end of this spectrum, the requirement that the 
parser "understand sentences in the manner of G" might be enforced only at parser output. 
The operations of the parser would be allowed to build internal representations that did not 
satisfy the principles of G so long as they did not sTirface at the output. At the opposite 
end, the requirement might be enforced incrementally as invariant constraints on hiternal 
represejitations. The structure-building operations of the parser would be constrained so as 
to build representations satisfying the principles of G. Satisfaction of the outjjiit constraint 
^^Thia phraac is from Miller <ind Choinsky (19C3). 

^^Tlic impractical "British imiacum alKoritlim" of g<;iierating all possible structural descriptions imd using 
the grammar to rule out those that are uusatifactory falls close to this end of the sjjectrum, but the kind 
of "unnatural" parsing dgorithm described by Aho ;\nd Ulhnan (1972:272) falls even closer, since it does 
not \ise the gr;immai- at jxll, but only happens to produce the right derivations. 

30 



rrmciplc-Biiscd Parsing Design chnractcristics 

would follow as a special case. 

Control atid constrjiint of a parser by the principles of iniiversal and particular gram- 
mar can thus be direct or hidirect. The goal of translating into parser design the benefits 
of modern principle-based hnguistic theories suggests that a relatively direct implementa- 
tion might be appropriate. One can envision a parser that is a direct implementation of 
GB-theory (Chomsky, 1981) hi that it recovers the Hnguistic descriptions that GB-thcory 
proposes, it recovers them by actively using the principles that GB-theory holds to charac- 
terize and define them, and it uses those principles without "midtiplying out" their effects 
to produce aii intermediate sot of phrcise-structurc rules. In terms of Fodor, Bever, and Gar- 
rett's (1974) typology of parsing methods, it would probiibly use elements of both analysis 
by analysis and analysis by synthesis. 

7.2. Preserving the structure of grammatical theory 

A certain amotmt of directness is required in an implementation that strives to retain 
the benefits of a moduhir grammatical theory: 

• The distinction between language- independent wiiversals and langiiage-particular 
parameters nmst be preserved if descriptions of individual languages are to remain 
small. 

• Other aspects of the modular structure of grammaticfd theory must be preserved 
if the combinatorial consequences of multiplying out interaction effects are to be 
avoided. 

• The set of linguistically significant operations should be preserved in parsing repre- 
sentations so that the parser designer cannot accidentally write rules that refer to 
predicates that are hnguistically nonsignificant. For example, the parsing frame- 
work should not lead the grammar rules to i)lace much more importfince on the 
left versus right distinction or the notion of string position than is warranted by 
tlie syntax of natural languages. 

• It is desirable for the structure of explanations to be mapped over intact from 
grammatical theory to parser operation. If certain grammatical principles force 
the assignment of a certain structure to a sentence, corresponding implementation 
principles should be responsible for the parser's decision to assign that structure. 

However, see section 8.1.3 for some hmita on the degree of directness that it is reasonable 
to impose. Making the notion of direct implementation more precise should be a subsidiary 
topic of research in this program. 

7.3. Avoiding mysteries 

One reason for investigating the possibility of a relatively direct relationship between 
grammar and parser is that indirect relationships, though not logically impossible or yet 
empirically falsified, noneth{;less give rise to what might be regarded as mysteries. 

Consider, for (example, a metarule accoimt of grammar. On such an account, the 
pro])erties of a language can be specified with a sm<dl set of context-frt^e base rules plus a 

31 



rrhiciplo-Biiscd Pcusinfr Design chnnictcristks 

sot of laeiarulos to derive new rules from old ones. The pars('r, however, operates only with 
the large set of derived rules. 

It is possihle to imagine that the human language faculty could be constituted roughly 
along the hues of the metarule account. The human parsing mechanism would then be 
capable of using an unrestricted set of context-free rules for parsing, but languages described 
by "unnatural" sets of context-free rules would never be observed because the language- 
acquisition component of the language faculty would never construct such a set. Applying 
only at the level of language acqtiisition, the constraints of universal grammar would play 
no role in actuid parser operation. 

In a way, however, any account that involves translation of restrictive principles into 
a less restricted frmnework reads hke a mystery story. If the language-acquisition com- 
ponent has the option of using powerful computational devices in the r\de systems that 
it constructs, it seems a mystery why it never uses them. If only a hmited range of the 
parser's computational abihties are ever needed, it seems a mystery why the parser isn't tai- 
lored to take advcintage of the restrictions on its actual computational problem. Of course, 
the acttifd nature of the parser and of the language-acquisition component are matters for 
empirical investigation. It might turn out that the correct theory of tlie human language 
faculty is one that seems at first to have mysterious properties. Nevertheless, the attempt 
to avoid mysteries is a sufficient reason for trying tt) investigate the "direct" rather th<m 
the "indirect" hne of x'arser implementations. 

Ilelatcd quahns come to mind about the "parsing strategics" proposed by Fodor, Bevcr, 
and Garrett (1974). If the parameters and principles of grammar are not directly realized hi 
the parser, one must ask for an explanation of why the structures that the parser recovers 
cue in accord with those principles. ^^ Such a question seems to arise whenever a system 
observes princijdes tliat play no causal role in its operation. 

7.4. Separating competence and performance principles 

In the design of a parser it is desirable to preserve as much as possible the distinction 
between competence aaid performance principles. If performance instructions must be ex- 
phcitly written into the language description that the designer of a natur.'d-languagc system 
must write, language descriptions will remain large and complex. In addition to specify- 
ing the parameters that characterize the core syntactic structure of a language, the system 
designer wiU be forced to describe in the rule system the detailed way in which those param- 
eters relate to surface evidcuice. As much as possible, the necessary performance jninciples 
should be applied automaticidly by the i)arsing machinery. 

Many complicated rules in Marcus's grammar seem to be concerned with questions hke 
these: 

• If an optional constituent isn't attached at the current level, wiU some higher 
syntactic context be able to receive it? 

• If a.n optional constituent it is attached at the current level, will some higher 
syntactic context be deprived of <m obhgatory constituent? 

^''Tlicrc may hv iUi answer, of cour.se; it ia again im empirical question whether realization of piuametors is 
direct. 

32 



rnnciplc-Biiscd Pursing Design dinnictcristics 

• If the w;/i-cc)mp is ns(;d at tlic ciirrcnt level, wiU sonic higher syntactic context be 
able to receive the NP that is rejected at this level and left over? 

• If the wh-comp isn^t used at the current level, will it be possible to use it elsewhere? 

These considerations seem amenable to incorporation into the structure of the grammar 
interpreter so that they will not need to be (ixphcitly written into grammar rules. If the 
graninicir interpreter could handle most performance principles systematically and auto- 
matically, without needing exphcitly coded nxle actions, the burden on th<; grammar writer 
could be reduced. It might become less of a misnomc^r to call Marcus's rule sets grammars, 
if rules came to encode more knowledge of the structure of a language and less about the 
details of recovering grammatical structures from local cues available from input strings. 

Marcus's diagnostic rules aie a case in i)oint. It would be easier to write language 
descriptions for Marcus's grammar interpreter if the intepreter were made exphcitly aware 
of the process of resolving nondeterminism. With Marcus's original grammar interpreter, 
nondetcrminism is resolved when the grammar writer notices a rule conflict or an overly 
gcnercd interpretation of a surface cue and then writes a diagnostic rule to distinguish 
between two situations by using semantics, examining the active node stack, or using more 
lookcxliead. The grammar uiterpreter itself treats diagnostic rules hke any other rides; it 
does not "know" when it has gotten into trouble and should consult some mechanism to 
resolve a conflict. 

The grammar writer therefore has the burden of foreseeing and resolving surface am- 
biguities and interaction effects. This increases the size ;md complexity of the rule system, 
and in practice a few techniques for resolving nondeterminism setun to be repeated over 
and over in difi*erent grammar rules. K the grammar interpreter recognized conflicts and 
hivoked exphcit resolution procedures, the modularity of the parsing system could be im- 
proved because fewer grammar rules would exphcitly perforxn exotic tests. 

7.5. Logical parsing theory 

One possible strategy for designing a parser involves studying what surface cues to 
syntactic structure are avciilable in an hiput sentence before deciding how to use those cues 
in guiding the recovery of structm-e. The possible theoretical level of "logical parsing the- 
ory" would concentrate more than grammatical theory on the nature of the computational 
problem that the parser must solve, but it would leave open the question of how the parser 
actually goes about using (some or all of) the information in the surface string. In Marr's 
fram(!Work (§2.2), both grammaticid theory and logicd parsing theory are part of the top- 
most level of computational theory, which both identifles the goal of the computation £ind 
investigates the logic of the strategies by which it may be carried out. Once logical parsing 
theory is available, it becomes a computer science problem to devise a detailed parsing 
algorithm using some subset of the available structurally releviuit information.^^ 

Given a formalization of crucial principles from linguistic theory, it should be possible to 
derive theorems about the surface appearance of un<lerlying constructions. These theorems 

^^The results of an oxpcriinoiit by Frazicr, Clifton, aaul Rajulall (1983) suggest that the human parsing 
Micchiuiism docs not apply all potentially rcleviuit constraints while computing a structural description. 
Some constraints are apjiarently not api)lied until a later stage. 

33 



rriiiciplc-Biu^cd Pursing Design chnractonstics 

can tlu!!! bo i)iit to use in parsing. Some th<x)rcnis will bo ultimately based only on linguistic 
\mivorsals, without mention of Icinguage-specific parfuneters. These theorems can be used, 
where appropriate, to fix the general structure of the parser. For example, boiimling theory 
can influence the choice of parser architecture because it places limits on the "range of 
search" that the fvrchitecture nnist support. In general, the relationsliip between a imiversal 
principle and its embodiment can be quite indirect without raising (piestions of hmguage 
acquisition and language-description size. If fundamental parameters of cross-linguistic 
variation are embodied in highly indirect fashion, however - if, for example, they show 
up as vai-iations hi the architecture of the parser - it will be necossju-y to consider closely 
how they might be set from experience in language acquisitioii or concisely stated in parser 
design. 

One possible topic for logical parsing theory is the appropriateness of varioiis internal 
representations that a parser might ixsc. Given a sufficiimtly developed hnguistic theory, it is 
possible to investigate how closely a proposed "representation cluster" ^® for the theoretical 
objects postidated in the theory conforms to the goal of displaying in a representation all 
juid only the information that is grammatically relevcint according to the theory. How 
should key notions such as dominance, adjacency, government, c-command, projection, 
subjacency, and the contiguity of constituents be reflected in the parser's re[)resentations? 
Given a representation and a sot of operations, can all S-atructiircs be derived by means of 
the operations? 

7.6. A target for parser coverage 

The enterprise of constructing a parser always involves decisions about the range of 
constructions that fire to be correctly processed. Given the nature and purpose of the 
research program that is proposed here, a reasonable target would be to handle all syntactic 
constructions of "core grammar" that are regarded as fundamental and reasonably well- 
understood in discussions of GB-thcory. This does not include all of language: 

[I]t is hardly to be expected that what are called "languages" or "dialects" 
or even "idiolects" will conform precisely or perhaps even very closely to 
tlie systems determined by fixing the parameters of \miversal grammar. 
This could oidy happen imder idealized conditions that <ire never realized 
in fact m the real world of heterogeneous speech connnunities. Further- 
more, each actual "laiiguage" will incorporate a periphery of borrowings, 
historical residues, inventions, and so on, which we can hardly expect to 
- and indeed would not want to — incorporate within a principled the- 
ory of universal grammar. For such reasons as these, it is reasonable to 
suppose that universal grcunmar determines a set of core grammars and 
that what is actually represented in the mind of an indivi<luid even un- 
der the idealization to a homogeneous speech comnmnity would be a core 
grammar with a periphery of marked elements mid constructions. 

Viewed against the reahty of what a particular person may have inside his 
head, core grammar i s iui ide alization. From another point of view, what a 

'''Tliia is a term from Liakov (1977). 

34 



rrinciplc-Dnscd Purshig Implicit representation 

particular person has insider his head is aji jutifact r(>sidting from the inter- 
play of many idiosyncratic factors, i\s contrasted with the more significant 
reahty of universal grammar (an element of shared Liologictd endowment) 
and core granrniar (one of the systems derived by fixing the parameters 
of universal grammar in one of the permitted ways). (Chomsky, 1981:8) 

Core notions mid nu^chanisms are, however, expected to play a role in determining the 
properties of even peripheral constriictions: 

We wonld expect the individurdly-represented artifact to depart from core 
grammar in two basic respects: (1) because of the het(;rogeneous char- 
acter of actual experience in real speech communities; (2) because of the 
distinction between core and periphery. The two respects are related, but 
distingui.shable. Putting aside the first factor — i.e., assuming the ide- 
alization to a homogeneous speech comnmnity — outside the dommn of 
core grammar we do not expect to find chaos. Marked struct\ires have to 
be leiirned on the basis of slender evidence too, so there should be fur- 
ther structiire to the system outside of core grammar. We might expect 
that the structure of these further systems relates to the tlicory of core 
grammar by such devices as relaxing certain conditions of core grammar, 
processes of <inalogy in some sense to be made precise, and so on (:8) 

Even so, it would be premature to tackle the periphery without first devising mi implemen- 
tation that can handle the core. 

In addition to foregoing treatment of peripher<d constructions in langtiage, initial stages 
of the proposed endeavor should avoid getting mired in several otlier issues: fimctionaJ ex- 
planations for hnguistic phenomena, the "commuiiicative functioti of language," "everyday 
language," "situated language," ajid the matters that might be called "semantic issues" in 
the broad sense. If tackled too early, these issues cannot fail to impede progress toward the 
development of a principle-based parser. 



8. Implicit representation 

Section 7.5 raised the question of how grammatically relevmit predicates and condi- 
tions might be represented in the proposed parser. One possible miswer involves implicitly 
embodying some predicates and conditions in the structm-e of the parser. This preliminary 
section explores that possibility. 

8.1. Implicit representation 

Under the proposed fine of parser development, it is usually not enough for the parser to 
observe the princijjles of grammatical theory. Rather, the i)rinciples are to play a relatively 
direct role in determining what action the parser should take at each i)oiut. In most cases 
the principles are to be causally implicated in explanaticms of parser behavior: it is to be the 
principles, and ;is httle else <is possible, that are actively used to determine the structural 
descrii)tion to be iissigned to an input. 

35 



Principlc-Biuscd PuTsing Implicit rvprcsentcitioii 

8.1.1. The implementation must preserve the benefits of modularity 

More spccificiilly, however, the exact degree of implementation "directness" to be 
achieved is uncertain, hideed, nnich remains to be exphcated about notions such as "direct 
implementation" and "direct use" of principles, and siich exphcation will be part of the 
thesis project. Nonetheless, the main constraint on the directness of the proposed parser 
implementation is clear: tlu; implementation shoidd preserve enough of the modularity of 
modern grammatical theory so that the possible benefits of that modularity will not be lost. 

In piirt, then, the reason why the implementation is to be direct is so that the sliift 
to modular syntactic theories can rediice the size and complexity of language descriptions 
in parser design just as it did in linguistic tlieory. Given that goid, the most important 
modular division to preserve in the majjping from theory to program is the distinction 
between langxiage-particular parameters and hnguistic imiversals. 

Consider, for example, a parser that produces the syntactic descriptions demanded by 
current syntactic theories but whose operation is based on a surface-oriented rule system. 
Such a rule system nmst "nndtiply otxt" the surface effects of viirious principles and param- 
eters. How nmch will such a rule system differ from the ride system for a closely related 
language? Recall the effects of changing a single parajiieter in modern griunmatical theories: 

In a tightly integrated theory with fairly rich internal structure, change 
in a single parameter may have complex effects, witli prohferating conse- 
quences in various parts of the grammar. Ideally, we hope that complexes 
of properties differentiating otherwise similar languages are reducible to a 
single parameter, fixed in one or another way. (Chomsky, 1981:6) 

In accordance v/ith this picture, a single surface-oriented rule expresses not a fundamental 
parameter of hnguistic variation, but a complex cunalgam of different prin<:iples and pa- 
rameters. Changing a parameter causes large changes in the rule system. The benefits of 
modularity are not obtained, for the langiiagc descriptions that the system designer nmst 
construct are large imd viiry greatly from language to language. 

8.1.2. The implementation must often preserve the structure of explanations 

One characteristic of ex])lanations is that they support countcrfactuals. By reducing 
situations to their underlying causes, mi explanatory theory describes not only what is 
true, but also what would be true under a range of different conditions. By identifying 
tlie princii)les and parfuneters on which a language-particular phenomenon depends, an 
explanatory theory reveals not only why it has its particular characteristics, but how those 
cliaracteristics would differ given a different set of underlying parameters. 

An interesting corollary of this fact is that preserving the parametric structure of 
cross-linguistic variation requires the implementation to preserve the logical structure of 
some hnguistic explanations. Moving from one language to another amoimts to using a 
different set of parruneters. Generally speaking, the implementation cannot take advantage 
of "lemmas" that are derived from grammatical princijdes cuid parmneters. It should use 
the principles iind i)arameters directly, because if it uses a language-particular lenuna, the 
lemma will have to be stated in the nile system that nudces up tlic description of the 

36 



rrhiciplc-Biisvd Parsing Implicit representation 

langriago. A different lemma will liave to be stated for a language with different underlying 
parameters; language descrii)tion3 will thus include more than the logically necessary sets 
of parameter values. 

8.1.3. Implicit representation is often permitted and desirable 

There arc three major cases in which this argument does not hold, however. First, 
it does not apply to language indepciident lemmas that do not depend on particular pa- 
rameter viilues. Second, it does not apply to certain parmnetrized lemmas that show how 
partmietric differences would affect their conclusions. Third, it does not apply when the 
parser rather than the system designer is responsible for generating the lemmas; in such 
cases the jjiirscr is only using the lemmas for "short-cut" access to results it could have 
derived from fundamental principles. 

The first two cases allow the possibility of implicit representation of grammatical prin- 
ciples and predicates. These cases represent a situation in which it is acceptable for the 
parser to merely act in accordance with i)rinciples instead of basmg its decisions on them 
directly. Langiiage-indepeiulent lemmas can be iised to fashion the basic architecture and 
actions of the parser. In fact, within the constraint of preserving parametric structure, it 
is desirable to tailor parser design closely to the generid computational problem defined by 
universal grammar. Without close tailoring, it becomes a mystery (§7.3) why no language 
uses the full jiower of the implemented parser. The suspicion arises that some property of 
universal grammar has been missed that woidd allow a more efficient or otherwise more 
desirable parser architecture. 

8.2. Monostrings 

Tailoring a representation to closely fit a desired set of predicates, operations, ^md 
constrmnts is nothing new to hnguistic theory. Indeed, the search for appropriate represen- 
tations is a key part of the construction of theories that can explain language acquisition. As 
the theories go, the langiiage learner projects from hmited experience in one way rather than 
another because the Imman language faculty makes available only a restricted framework 
for describing hnguistic experience. Children never frame arbitrarily bizarre hypotheses 
about the structures of their huiguages because such hypotheses are not statable with the 
available hiternal vocabiilary. 

8.2.1. The monostring representation implicitly embodies universal restrictions 

Lasnik mid Kupin (1977) have described a restricted transformational framework that 
provides an example of the attempt to build the restrictions of universal grammar directly 
into the formalism used for stating rules of grammar: 

The theory differs from [Chomsky's earfiest transformational fornicilisra], 
Mid more markedly from most current theori(*s, hi the extent to which 
restrictions are imposed on descri})tive power. Many well-jtistified and 
hnguistically significant limitations on structural description and struc- 
tur<d change i\xe embodied in the present form<dism In this paper 

37 



rrinciph-Diiscd raising 



Implicit representation 




{abed, S, Xcd, abY} 



Figure 6: In the framework of Lasnik and Kiipin (1977), the tree on the left could be 
described by the set of strings on the right. Except for the terminal string, each string in 
the set is a monostring. 



we are attempting to present a particidar theory of syntiix in a precise 
way. Many of the operations describable within other theories cannot be 
expressed within this theory. (:173) 

Lasnik and Kupin's restrictive formalism helps illuminate the possible nature of the rep- 
resentations used by the human language faculty. Many properties of the set of possible 
transformations would follow from the assumption that the language faculty uses a repre- 
sentation like the one that Lasnik emd Kupin propose. 

La.^nik and Kupin xisc so-called monostrings to capture the hierarchical relationships 
that are usually represented with tree diagrams. Each monostring represents a particular 
occurrence of a phrasal category. With a monostring representation, a phrfise-miurker is a set 
of strings instead of a tree. Figure 6 gives the monostring representation that corresponds 
to a simple tree. 

The monostring representation is more closely tailored to certain theories of universal 
grammar than a tree representation would be. It fjiils to represent certain distinctions that 
a tree would represent, and therefore it is suited oidy to a theory of universal grammar in 
which those distinctions are never relevant for the description of natural laiiguages. Figure 7 
shows two trees that have the same monostring representation. Lasnik and Kupin comment: 

The choice of [the monostring representation], then, constitutes an em- 
pirical claim about hunicin language. All grammars in this theory will 
necessarily treat [the two trees shown in the figure] identically since they 
have identical representations (1977:178) 

The "pruning" of two identical nodes domin<iting the same material also follows from the 
nature of the monostring representation. A nonbranching node of the same type as its 
daughter is "invisible" with that representation. Again an empiriciil claim is made: 

[A reduced phrase-marker] is essentially a collection of is a statements. An 
is a statement concerns only the relationship between a portion of the ter- 



38 



Prhi ciplc-Ihiscd Parsiiig 
S 



Implicit rcpvcscii ta tion 




C 




{ab,S,Cb,Ab,aB} 



Figure 7: The two trees on the left have the same reduced phrase-marker, shown on the 
right. This example is taken from Lasnik and Kupin (1977). 



minal string mad a non-terminal. In that view there is no point in saying a 
particular occurrence of a termincil [stands in the is a relationship to some 

other nonterminal] twice (as [the tmpruned tree] apparently does) In 

this theory, pruning thus becomes a non-issue, since the repeated nodes 
never exist to be pruned. There is never a conversion to more tree-like 
objects so the issue never comes up. Thus, the effects of pruning, if indeed 

there are any, are unavoidable It is important to note them in principle 

a base component could distinguish between [the pruned and unpruncd 
trees]. Thus, [by choosing this representation] we are making the claim 
that a transformational component does not require access to all of the 
information inherent in a base component. (:179) 

The design of the proposed parser will strive for this kind of close fit between the information 
that is made exphcit in the representation and the information that is deemed grammatic.dly 
significant by linguistic theory. Mysteries arise when the representation displays a wide 
range of information that the grammatical system never uses. 

Note that an implementation is not required to observe insignificant "presentation 
details" of hnguistic th(H)ries. For example, some theorists who might actuidly prefer to use 
Lasnik mid Kupin's framework still draw trees for expository convenience. Uidike some of 
the information they depict, the trees arc not theoreticjdly significant. In a mathematical 
sense, the trees are convenient models of hnguistically significant statements. 



39 



rrinciplc-Diiscd P;u-.sni^' Implicit representation 

8.2.2. Only a restricted class of transformations can be stated 

Lcusnik iuid K\ipin'a framework also restricts the class of transformations that can be 
stated. Again, the restrictions jire hitcndcd to embody empirical assumptions: 

All of th(; definitions suid all of the principles of application described below 
are assumed to be part of general linguistic theory, i.e., to be biologically 
based [emphasis added]. (1977:179) 

A leftward NP-niovement transformation would be stated as (24) in Lasnik and Kupin's 
framework: 

(24) (NP NP, (2/1)) 

Explicit variables are not allowed in the statement of a transformation; there are implicit 
variables between all consecutive elements aiid hence no transfoririation can require two el- 
ements to be adjacent. There are no Boolean cond)inati(ms of string conditions, clausemate 
conditions, or multiple cUuUyzability conditions. Transformations cire not marked optional 
or obhgatory. There can be at most two affected constituents, so an NP-movement transfor- 
mation cannot also insert a passive morpheme. There axe only a finite number of possible 
transformations. 

Again, this restrictive framework illustrates the kind of close match betwetni theoret- 
icfdly permissible operations aiid represcntationally expressible operations that shoidd be 
heavily used in the proposed parser design. 

8.3. Subjacericy 

The subjcicency constraint provides another naturfd opportunity to implicit represent 
grammatical constraints in the design of a parser. Sivbjacency is a constriiint on movement 
that to first approxunation forbids moving in one jump across more thmi one "bounding 
category," where NP and S <ue bounding categories. ^^ For example, subjacency (so formu- 
lated) forbids ti;/i*movement from applying to produce (25): 

(25) *who do you believe [m. the claim [s that Bill saw e ] ]? 

Stated another way, the subjacency constraint requires that movement transformations must 
apply to elements in the same domain or adjacent domains. 

Marcus (1980, Chapter 6) attempted to show that important subcases of the sitbjacency 
constraint followcni naturally from the structure of his grammar interpreter. In turn, the 
crucial properties of the grammar interpreter were motivated by its deterministic operation. 
For reasons that I will not detail here, Marcus's arguments do not completely go through. 
However, Berwick and Weinberg (1984) show that a constrained, deterministic parser of a 
certain kind must obey some principle similar to subjacency. 

Fodor (1983) presents two possible treatments of phenomena related to subjacency. 
They contrast sharply with the general Jipproach followed by Marcus imd by Berwick ixnd 
Weiid)erg. First, considering .an ATN hold-cell parsing model, she proposes that "island 
conatr<iints^' (sucli as the Complex NP Constraint, derived from sidyacency in many current 

'^Some iiKxlcrn theories of aiibjuccncy, such iia the theory described in Cliouisky's Fall 1983 class lectures, 
arc more complex. 

40 



Principle-Bused Parsing Implicit representation 

tlicorics) can be liaiullcd by tiding a structured hold cell mul arranging for the parser to 
follow "a set of traffic ndes to control access to its various levels" (:172). 

Second, working in a surface-oriented context-free framework, she suggests that the 
subjacency constraint, if needed at all, should be handled by iirranging for the rule system 
not to include any riiles that would violate it: 

Gazdar has no nji.ilog to Subjacency in his system at present, but at least 
for English a comparable effect could be achieved if passage of a slash 
annotation through either an S or an NP node was blocked, but a rule 
Wtis added to allow a slash annotation at the toj) of a complement clause 
to be cashed out as a trace, and a new ])ath of slashed nodes to be initiated 
with this trace as its filler. For Italian, however, Rizzi's analysis clearly 
requires that transmission of a slashed node be blocked only at the second 
of two cyclic nodes, imd this entails . . . [that] the slashed nodes on the 
path would have to be tagged with information about dominating cyclic 
nodes. (Fodor, 1983:191) 

In botli cases Fodor is effectively suggesting that the subjacency constrmnt should be de- 
scriptively imposed on top of the constraints (if ;my) that result from basic parser structure. 
The line of parser development that is suggested here rejects such descriptive approaches, 
which simply describe the effects of principles and conditions instead of building them fim- 
damentally into the actions and representations of the parser. Given the desire to base 
parser design solidly on hnguistic theory, it is better to avoid a parsing frarucwork in which 
the parser would work just as well for parsing "tmnatural" languages (with properties nnat- 
tested in natural languages) as for parsing natiirid hmguages. 

One can imaguie severed possible parsing approaches to the subjacency constrtiint. It is 
necessary to give some account of why subjacency treats certain domains as luiits; perhaps 
it will be possible to find some process necessary to other aspects of parser operation that 
already treats those domains (and only those) as units. A contrary approach is idso possible; 
perhaps subjacency domains are not agglomerated as units, but instead the intervening nou- 
bounding material is dismissed from some relevant local store and hence is not 'Visible" to 
hinder movement. It is desirable to explain why two dommns can be involved, not three 
or just one.^^ Fodor (above) suggested the "barrier" account in which search-barriers t»rc 
inserted into memory stores for some reason. Again a contrasting approach is possible; 
perhaps there is not barrier insertion, but rather the tcjnporary dismissjil of material that 
would be hidden by a barrier. 

8.4. C-command 

Berwick and Weinberg (1984, Chapter 5) provide a finjil example of implicit repre- 
sentation. They show how the structure of the Marcus parso^r can be adjusted so that the 
granmiatically relevcuit pr(Hlicate of c-command (Reinhaxt, 197G; Aomi and Si)ortiche, 1983) 
does not need to be exphcitly comi)uted, bat is implicitly avmlable as a by-i)rodact of normal 

^^See Berwick and Woinberg (1984) for one explanation, based on the observation that "grammars can't 
count." Another possibility might somehow involve a process in Vi'hich the parser is trying to relate two 
ijartially built structures. 

41 



rrinciple-Ihiscd Pmvsing Related work 

parser operation: 

Wliile traces an nontraccs diverge with respect to bounding conditions, 
traces and some of the nontrace categories are similar in that they both 
obey c-conimajid. Traces, pronoims bound by some qnantifiers, and lexical 
anaphors must be c-commanded by their antecedents. 

Given that c-command is a basic predicate of the government-binding 
theory, we must be able to compute it from the parsing representation. 
The obvious way to do this would be to use a full tree representation and 
then design an algorithm to compute c-command from it. 

Alternatively, we could build on the fly a representation that makes the 
calculation of c-conmumd computationally trivial. This is the tack that 
we shall take. (Berwick ajid Weinberg, 1984:173) 

Given [a certain] principle of attachment mid node completion, the active 
node stack extensionally represents the c-comm<ind predicate; c-command 
need not be separately computed. {:175) 

This way of representing grammatictilly relevant predicates is quite attractive under the 
ciirrcnit jjroposal for parser design, since it represents very close fit between the set of 
grammatically relevant predicates and the design of the parser. 

9. Relation to other work 

As noted, the proposed research program builds on work in linguistics, computer sci- 
enct;, psychohngiiistics, and parser design. Chomsky (1981) describes the government- 
binding theory of grammar that is to be implemented. Lasnik and Saito (1983) provide re- 
cent revisions. Some variant of the monostring representation of Liisnik and Kupin (1977), 
which was discussed in section 8.2, could potentially form the basis for the parser's repre- 
sentation of hierarchical structiire. Stowell (1981) discusses in detail many cases hi which 
grammatical principles can interact to account for the surface constituent-order facts that 
were formerly accounted for with detailed phrase-striicture rules. The proposed parser is 
to make crucial use of interacting principles to deterndne surfcxce order. 

Alio <md Ullman (1972) discuss formal results about so-called covering grammars that 
could be useful in clarifying the notions "implementation of a grammatical theory" and 
"direct implementation." Liskov et al. (1977) and others have discussed and developed the 
notion "implementation of an abstract object." In the proposed research, linguistic theory 
will be taken to define a fmnily of abstract objects that it is the parser's job to hnplement. 

Fodor, Bever, tmd Garrett (1974) survey psychoHnguistic results that may (with cau- 
tion) be interpreted to describe some properties of human language-processing m(^chanisni8. 
More recently, Frazier mid Rayner (1982) mid Frazier, Clifton, and Rmidall (1983) have 
done experiments that bear on the question of how the human syntactic processor deals 
with parsing mnbiguities. Questions about sucli ambiguities were an importfmt factor in the 
design of Marcus's (1980) deterministic parser and will probably <dso significantly influence 

42 



Prhiciplc-BiLscd Purtiing Gcncml phm 

tlie proposed parser design. Works by Seideiiberg et al. (1982) aiid Milne (1983) present 
further psycliolingnistic disciissions. 

The work of Marcus (1980) as refined and discussed by Berwick and W<'iiiberg (1984) 
coidd easily j)lay a:i important role in tlie proposed research. Of the currently avciilable 
parsing niodt^ls, the modified Marcus model is probably the most closely tailored to the 
principles of grammatical theory. However, the model should be further modified so that it 
bases its operation on princiiilcs i\nd parameters rather than a complex rule system. Milne's 
(1983) work is also in the Marcus framework. 

The proposed parser would implenjent grammatical principles more directly than ex- 
isting parsers in the Marcus framework. Berwick and Weinberg point out that the rules 
for their modified Marcus parser are several steps removed from grammatical principles; 
in effect, the rule system expresses derived lemmas as discussed in section 8.1. Works by 
Gaxdai- (1981) <ind Fodor (1974) may be useful for contrast, since in many ways they fall 
at the opposite end of the spectrum from the proposed parser. Bachenko et al. (1983), 
Wehrh (1983), and Shieber (1983) sketch parsing models that may be relevant. 

10. A suggested plan for initial research 

One possible i)lan for the construction of a principle-based parser begins with two com- 
plementary prongs of initial attack. On the one hand, the Marcus parser can be successively 
modified to base its operation on principles and parmneters in more and more cases. For ex- 
?uni)le, the first case to be tackled might be the NP-movement case mentioned hi section 6.2. 
As this effort proceeds, a working set of concrete examples could be built up. A "map" of 
the logical ri^lationships among viirious principles and parameters could also be developed. 
A catalog of hngtiistic princii)les, constructions, mid examples coidd be compiled. 

On the other hand, efforts should continue to find representations and operations that 
cire closely tailored to the predicates, operations, and parameters of grammatical theory. 
This second effort would thus aim for the development of possible alternatives to the Marcus 
framework. While the first-prong approach seeks to modify an existing model of parsing 
decisions and actions, the second approach follows section 7.5 in separating the question of 
what representations and actions the parser might use from the question of how it should 
decide what action to take at each point. The development of logical parsing theory is part 
of this fine of research. 

The results of this initial phase of resecirch could sui)port the choice of a particular 
parser design for fiirther development. With a general design chosen, resc^arch can focus on 
reducing the miiount of uif or mat ion that must be specified for parsing a particular langtiage. 
The ideal fimit of such reduction would <dlow a language to be described for parsing by the 
same set of paramett>rs that specified the characteristics of the laiiguage in hnguistic theory. 
In the ideal Umit, all performance principles and interaction effects would be handled by 
the parser rather than the grammar writer. 



43 



Prhiciplc-Biiscd Pursing Dibliogmphy 

11. Bibliography 

Alio, A. v., and J. D. UUinan (1972). The Theory of Parsing, Translation, and Compiling. 
Vol. 1: Parsing. Englewood Cliffs, N.J.: Prentice-Hall. 

Alio, A. v., and J. D. Ullnian (1977). Principles of Compiler Design. Reading, Mjiss.: 
Addison- Wesley. 

Aoun, J., Mid D. Sportidie (1983). "On the Formal Theory of Government," The Linguistic 
Rcviexv 2: 211-236. 

Bachenko, J., D. Hindle, and E. Fitzpatrick (1983). "Constraining a Deterministic Parser," 
to appear in Proc. Nat^l. Conf. on Artificial Intelligence (AAAI-83). 

Baddeley, A. D. (1976). The Psychology of Memory. New York: Basic Books. 

Baltin, M. R. (1982). "A Landing Site Theory of Movement Rules," Linguistic Inquiry 13:1, 
1-38. 

Bates, M. (1978). "The Theory and Priictice of Augmented Transition Network Grammars," 
ill L. Bole, ed., Natural Language Communication with Computers, Lecture Notes in 
Computer Science 63, 191-254. New York: Springer- Verlag. 

Berwick, R. C, and A. S. Weinberg (1984). The Grammatical Basis of Linguistic Perfor- 
mance. Cambridge, Mass.: M.LT. Press, 

Chomsky, N. (1965). Aspects of the Theory of Syntax. Cambridge, Mass.: M.LT. Press. 

Chomsky, N. (1976). "Conditions on Rules of Grammar," Linguistic Analysis 2:4, 303-351. 

Chomsky, N. (1981). Lectures on Government and Binding. Dordrecht, Holland: Foris 
Pubhcations. 

Ecirley, J. (1970). "An Efficient Context-Free Parsing Algorithm," Comm. ^CM 14, 453- 
460. 

Fiengo, R. (1977). "On Trace Theory." Linguistic Inquiry 8:1, 35-61. 

Fodor, J. A., T. G. Bever, and M. F. Garrett (1974). The Psychology of Language. New 
York: McGraw-Hill. 

Fodor, J. D. (1983). "Phrase Structure Parsing and the Island Constraints," Linguistics 
and Philosophy 6, 163-223. 

Frazier, L, and K. Rayner (1982). "M<iking and Correcting Errors during Sentence Com- 
prehension: Eye Movements in the Anfilysis of Structurally Ambiguous Sentences," 
Cognitive Psychology 14, 178-210. 

Frazier, L., C. Clifton, and J. R;mdall (1983). "Filling Gaps: Decision principles and 
structure in sentence cfmiprehension," Cognition 13, 187-222. 

Gazdar, G. (1981). "Unbounded D(?pendencies and Coordinate StrTictnre," Linguistic In- 
quiry 12:2, 155-184. 

44 



Prhiciplc-Bciscd Parsing Dihllogrnphy 

Grcenberg, J. H. (1903). "Some imiversals of grammar with particular reference to the 
order of meaningful elements," in J. H. Greenberg, c&r, Universals of Langxiage, 58-90. 
Cambridge, Mass.: M.I.T. Press. 

Grimshaw, J. (1979). "Complement Selection and the Lexicon," Linguistic Inquiry 10:2, 
279-326. 

Jackendoff, R. (1977). X Syntax: A Study of Phrase Structure. Cambridge, Mass.: M.I.T. 
Press. 

Lcxsnik, H. (1976). "Remarks on Coreference," Linguistic Analysis 2:1, 1-22. 

Lasnik, IL, and J. J. Kupin (1977). "A Restrictive Theory of Transformational Grammar," 
Theoretical Linguistics 4:3, 173-196. 

Lasnik, H., and M. Saito (1983). "On the Nature of Proper Government." Unpublished 
ms., University of Connecticut and Massachusetts Institute of Technology. 

Lightfoot, D. (1982). The Language Lottery: Toxuard a Biology of Grammars. Cambridge, 
Mass.: M.I.T. Press. 

Liskov, B., A. Snyder, R. Atkinson, and C. Schaffert (1977). "Abstraction Mechanisms in 
CLU," Comrn. yl CM 20:8, 564-576. 

Marcus, M. P. (1980). A Theory of Syntactic Recognition for Natural Language. Cambridge, 
Mass.: M.I.T. Press. 

Marr, D. (1982). Vision. San Francisco: W. H. Freeman and Company. 

Miller, G. A., <md N. Chomsky (1963). "Finitary Models of Lmiguage Users," in R. D. 
Luce, R. R. Bush, and E. Gahmter, eds., Handbook of Mathematical Psychology, vol. 
II, 419 492. New York: John Wiley and Sons, Inc. 

Milne, R. W. (1983). Resolving Lexical Ambiguity in a Deterministic Parser. Ph.D. thesis, 
University of Edinbiirgh. 

Newmeye;r, F. J. (1980). Linguistic Theory in America. New York: Academic Press. 

Reinhart, T. (1976). The Syntactic Domain of Anaphora. Ph.D. thesis, Department of 
Foreign Literatures and Linguistics, M.I.T., Cambridge, Mass. 

Robinson, J. .1. (1982). "DIAGRAM: A Grammar for Dialogues," Comm. ACM 2^:1, 
27-47. 

Sager, N. (1981). Natural Language Information Processing: A Computer Grammar of 
English and Its Applications. Reading, Mavss.: Addison- Wesley. 

Seidenberg, M. S., M. K. Tanenhaus, J. M. Leiman, and M. Bienkov/ski (1982). "Auto- 
matic Access of the Memiings of Ambiguous Words in Context: Some Limitations of 
Knowledge-Based Processing," Cognitive Psychology 14, 489 537. 

Sheiber, S. M. (1983). "Direct Parsing of ID /LP Grammars." Technical Report 291R, SRI 
International, Menlo Park, California. Also appears in Linguistics and Philosophy 7:2. 

Stowell, T. (1981). Origins of Phrase Structure. Ph.D. thesis. Department of Lingiiistica 
and Philosophy, M.I.T., Cambridge, Mass. 

45 



